EPACTS (Efficient and Parallelizable Association Container Toolbox) is a versatile software pipeline to perform various statistical tests for identifying genome-wide association from sequence data through a user-friendly interface, both to scientific analysts and to method developers.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144 ~]$ module load EPACTS
[user@cn3144 ~]$ cp ${EPACTS_DIR}/1000G_* .
[user@cn3144 ~]$ epacts single --vcf 1000G_exome_chr20_example_softFiltered.calls.vcf.gz \
--ped 1000G_dummy_pheno.ped --min-maf 0.001 --chr 20 --pheno DISEASE \
--cov AGE --cov SEX --test b.score --anno --out test --run 2
Detected phenotypes with 2 unique values - 1 and 2 - considering them as binary phenotypes... re-encoding them into 1 and 2
Successfully written phenotypes and 2 covariates across 266 individuals
Processing chromosome 20...
Finished generating EPACTS Makefile
Running 2 parallel jobs of EPACTS
forkExecWait(): make -f /data/user/test.Makefile -j 2
Rscript /usr/local/share/EPACTS/epactsSingle.R --vanilla /usr/local /data/user/test.phe /data/user/test.cov /data/user/test.ind /data/user/1000G_exome_chr20_example_softFiltered.calls.vcf.gz 20:1-10000000 /data/user/test.20.1.10000000.epacts GT 0.001 1 3 1000000000 0.5 0 FALSE single.b.score
Rscript /usr/local/share/EPACTS/epactsSingle.R --vanilla /usr/local /data/user/test.phe /data/user/test.cov /data/user/test.ind /data/user/1000G_exome_chr20_example_softFiltered.calls.vcf.gz 20:10000001-20000000 /data/user/test.20.10000001.20000000.epacts GT 0.001 1 3 1000000000 0.5 0 FALSE single.b.score
Loading required package: epactsR
Loading required package: epactsR
NOTICE - Reading VCF took 1 seconds
[....]
zcat /data/user/test.epacts.gz | awk '$9 != "NA" { print $0 }' | sort -g -k 9 | head -n 5000 > /data/user/test.epacts.top5000
touch /data/user/test.epacts.OK
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
Create a batch input file (e.g. EPACTS.sh). For example:
#!/bin/bash
set -e
cd /data/$USER
module load EPACTS
cp ${EPACTS_DIR}/1000G_* .
epacts single --vcf 1000G_exome_chr20_example_softFiltered.calls.vcf.gz \
--ped 1000G_dummy_pheno.ped --min-maf 0.001 --chr 20 --pheno DISEASE \
--cov AGE --cov SEX --test b.score --anno --out test --run 2
Submit this job using the Slurm sbatch command.
sbatch [--mem=#] EPACTS.sh
Create a swarmfile (e.g. EPACTS.swarm). For example:
epacts single --vcf input1.vcf --ped pheno.ped --out out1 --run 2 epacts single --vcf input2.vcf --ped pheno.ped --out out2 --run 2 epacts single --vcf input3.vcf --ped pheno.ped --out out3 --run 2 epacts single --vcf input4.vcf --ped pheno.ped --out out4 --run 2
Submit this job using the swarm command.
swarm -f EPACTS.swarm [-g #] [-t #] --module EPACTSwhere
| -g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
| -t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
| --module EPACTS | Loads the EPACTS module for each subjob in the swarm |