SAIGE is installed as a container with it's own R environment on the Biowulf Cluster, please do not load R module when running SAIGE. If there are conflicts/errors about R, please check the loaded modules with 'module list'.
SAIGE is an R package developed with Rcpp for genome-wide association tests in large-scale data sets and biobanks. The method:
SAIGE-GENE (now known as SAIGE-GENE+) are new method extension in the R package for testing rare variant in set-based tests.
The package takes genotype file input in the following formats
step1_fitNULLGLMM.R --help
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --cpus-per-task=6 --mem=4G
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144 ~]$ module load SAIGE
[user@cn3144 ~]$ cp -r ${SAIGE_TEST_DATA:-none}/extdata .
[user@cn3144 ~]$ cd extdata
[user@cn3144 ~]$ step1_fitNULLGLMM.R \
--plinkFile=./input/nfam_100_nindep_0_step1_includeMoreRareVariants_poly \
--phenoFile=./input/pheno_1000samples.txt_withdosages_withBothTraitTypes.txt \
--phenoCol=y_binary \
--covarColList=a9 \
--sampleIDColinphenoFile=IID \
--traitType=binary \
--outputPrefix=./output/example_binary_includenonAutoforvarRatio \
--nThreads=4 \
--LOCO=FALSE \
--relatednessCutoff=0.0 \
--FemaleCode=2 \
--MaleCode=1 \
--IsOverwriteVarianceRatioFile=TRUE
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
Create a batch input file (e.g. SAIGE.sh). For example:
#!/bin/bash
#SBATCH --cpus-per-task=6
#SBATCH --mem=4G
#SBATCH --time=2:00:00
#SBATCH --partition=norm
set -e
module load SAIGE
step1_fitNULLGLMM.R \
--plinkFile=./input/nfam_100_nindep_0_step1_includeMoreRareVariants_poly \
--phenoFile=./input/pheno_1000samples.txt_withdosages_withBothTraitTypes.txt \
--phenoCol=y_binary \
--covarColList=a9 \
--sampleIDColinphenoFile=IID \
--traitType=binary \
--outputPrefix=./output/example_binary_includenonAutoforvarRatio \
--nThreads=4 \
--LOCO=FALSE \
--relatednessCutoff=0.0 \
--FemaleCode=2 \
--MaleCode=1 \
--IsOverwriteVarianceRatioFile=TRUE
Submit the job:
sbatch SAIGE.sh
Create a swarmfile (e.g. job.swarm). For example:
cd dir1; step1_fitNULLGLMM.R --help
cd dir2; step1_fitNULLGLMM.R --help
Submit this job using the swarm command.
swarm -f job.swarm [-g #] --module SAIGEwhere
| -g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
| --module | Loads the module for each subjob in the swarm |