Mach2qtl uses dosages/posterior probabilities inferred with mach or minimac as predictors in a linear regression to test association with quantitative traits.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144 ~]$ module load mach2qtl
[user@cn3144 ~]$ cd /data/$USER/test_data/mach2qtl
[user@cn3144 ~]$ mach2qtl -d sample.dat -p sample.ped -i sample.mlinfo \
--probfile sample.mlprob \
--samplesize > test.out
[user@cn3144 ~]$ less test.out
Mach2Qtl V1.1.3 (2013-04-25) -- QTL Association Mapping with Imputed Allele Counts
(c) 2007 Goncalo Abecasis, Yun Li
The following parameters are in effect:
Available Options
Phenotypic Data : --datfile [sample.dat], --pedfile [sample.ped]
Imputed Allele Counts : --infofile [sample.mlinfo], --dosefile [],
--probfile [sample.mlprob]
Analysis Options : --useCovariates [ON], --quantileNormalization,
--dominant, --recessive, --additive [ON]
Output : --samplesize [ON], --rsqcutoff [0.00]
FITTED MODELS (for covariate adjusted residuals)
======================================================
Trait Raw Mean Raw Variance Mean Variance
lg10CRP 0.52383 0.13666 -0.00000 0.13180
Loading marker information ...
1001 markers will be analyzed
Processing prob file ...
Executing first [ass analysis of imputed genotypes ...
Executing second pass analysis of imputed genotypes ...
INFORMATION FROM .info FILE QTL ASSOCIATION ADDITIVE
===================================== =================================
TRAIT MARKER ALLELES FREQ1 RSQR EFFECT2 STDERR CHISQ PVALUE N
lg10CRP rs9977217 A,G .3950 .9805 0.020 0.018 1.2320 0.267 867
lg10CRP rs9977094 A,C .9894 .3454 0.179 0.133 1.8130 0.1782 867
lg10CRP rs10854272 C,T .4169 .9562 0.014 0.018 0.6349 0.4256 867
...
Analysis took 1 seconds
[user@cn3144 ~]$ mach2qtl -d sample.dat -p sample.ped -i sample.mlinfo \
--probfile sample.mlprob --dominant --recessive \
--samplesize > test.out
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
To run a single mach2qtl analysis on biowulf, a batch script similar to the
exammple below will be required:
#! /bin/bash
set -e
cd /data/$USER/test_data/mach2qtl
module load mach2qtl/1.1.3
mach2qtl -d sample.dat -p sample.ped -i sample.mlinfo \
--probfile sample.mlprob --dominant --recessive \
--samplesize > test.out
This script can then be submitted to the slurm batch queue with a command like
biowulf$ sbatch --mem=2g batch_script.sh
It is easiest to run a large number of simultaneous mach2qtl jobs via swarm.
Set up a swarm command file along the following lines:
#------- this file is swarmcmd ------------------ mach2qtl -d sample1.dat -p sample1.ped -i sample1.mlinf --probfile sample1.mlp > sample1.out mach2qtl -d sample2.dat -p sample2.ped -i sample2.mlinf --probfile sample2.mlp > sample2.out mach2qtl -d sample1.dat -p sample3.ped -i sample3.mlinf --probfile sample3.mlp > sample3.out mach2qtl -d sample4.dat -p sample4.ped -i sample4.mlinf --probfile sample4.mlp > sample4.out [...]
If each mach2qtl process requires less than 1 GB of memory, submit this to the batch system with the command:
swarm -f swarmcmd --module mach2qtl
If each mach2qtl process requires more than 1 GB of memory, use
swarm -g # -f cmdfile --module mach2qtl
where '#' is the number of Gigabytes of memory required by each mach2qtl
process. The swarm program will package the commands for best efficiency and
send them to the batch system as a job array.
Mach2qtl readme:
mach2qtl
========
QTL analysis based on imputed dosages/posterior_probabilities
Options:
--------
--datfile : merlin marker info file for phenotypes (required)
--pedfile : merlin pedigree file for phenotypes (required)
--infofile : mach1 output .info or .mlinfo file (required)
--dosefile : mach1 output .dose or .mldose file (required if probfile not available)
--probfile : mach1 output .prob or .mlprob file (required for non-additive model)
--useCovariates : adjust for covariates (default is ON)
--quantileNormalization : apply inverse normal transformation to quantitative trait(s) (default if OFF)
--dominant : dominant model (default if OFF)
--recessive : recessive model (default if OFF)
--additive : additive model (default if ON)
Sample command line & Example
-----------------------------
see sub-folder examples/
Coding
======
For recessive model: AL1/AL1 1
AL1/AL2 & AL2/AL2 0
so a positive beta means AL1/AL1 is associated with higher trait values)
For dominant model: AL2/AL2 1
AL1/AL2 & AL1/AL1 0
so a positive beta means AL2/AL2 is associated with higher trait values)
For additive model: AL1 0
AL2 1
(notice the negative sign in "double effect = -numerator / denominator;"
so a positive beta means AL2 is associated with higher trait values)
Procedure:
----------
Calculate residuals (if related, use VC)
Perform residual-SNP association
Selected Output
----------------
Raw Mean/Variance vs Mean/Variance
former for y (could be transformed)
latter for residuals (even if no residuals, Raw Mean and Mean could be different b/c latter would be centered and thus would be ZERO)
ref:
----
Li Y, Willer CJ, Sanna S, and Abecasis GR (2009). Genotype imputation. Annu Rev Genomics Hum Genet. 10: 387-406.
Chen WM and Abecasis GR (2007). Family-based association tests for genomewide association scans. Am J Hum Genet 81:913-26