regenie is a C++ program for whole genome regression modelling
of large genome-wide association studies. It is developed and supported
by a team of scientists at the Regeneron Genetics Center.
regenie employs the BGEN library.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive [user@cn3101 ~]$module load regenie/3.0.3 [+] Loading singularity 3.10.0 on cn3063 [+] Loading regenie 3.0.3The available executables are:
[user@cn3101]$ ls $REGENIE_BIN bgenix cat-bgen edit-bgen regenie zstdIn particular, the command line options of the executable regenie are as follows:
[user@cn3101]$ regenie --help
|============================|
| REGENIE v3.0.3 |
|============================|
Copyright (c) 2020-2022 Joelle Mbatchou, Andrey Ziyatdinov and Jonathan Marchini.
Distributed under the MIT License.
Usage:
/regenie/regenie [OPTION...]
-h, --help print list of available options
--helpFull print list of all available options
Main options:
--step INT specify if fitting null model (=1) or
association testing (=2)
--bed PREFIX prefix to PLINK .bed/.bim/.fam files
--pgen PREFIX prefix to PLINK2 .pgen/.pvar/.psam files
--bgen FILE BGEN file
--sample FILE sample file corresponding to BGEN file
--ref-first use the first allele as the reference for
...
To perform training of the predictor network using this executable, copy sample data to the current folder:
[user@cn3101]$ cp $REGENIE_DATA/* .A sample command to run regenie:
[user@cn3101]$ regenie \
--step 1 \
--bgen example.bgen \
--out my_output \
--bsize 200 \
--phenoFile phenotype_bin.txt
Start time: Tue Aug 16 13:24:00 2022
|============================|
| REGENIE v3.0.3 |
|============================|
Copyright (c) 2020-2022 Joelle Mbatchou, Andrey Ziyatdinov and Jonathan Marchini.
Distributed under the MIT License.
Log of output saved in file : my_output.log
Options in effect:
--bgen example.bgen \
--out my_output \
--step 1 \
--bsize 200 \
--phenoFile phenotype_bin.txt
Fitting null model
* bgen : [example.bgen]
-summary : bgen file (v1.2 layout, zlib compressed) with 500 named samples and 1000 variants with 8-bit encoding.
-index bgi file [example.bgen.bgi]
* phenotypes : [phenotype_bin.txt] n_pheno = 2
-keeping and mean-imputing missing observations (done for each trait)
-number of phenotyped individuals = 500
* number of individuals used in analysis = 500
-residualizing and scaling phenotypes...done (0ms)
* # threads : [55]
* block size : [200]
* # blocks : [5] for 1000 variants
* # CV folds : [5]
* ridge data_l0 : [5 : 0.01 0.25 0.5 0.75 0.99 ]
* ridge data_l1 : [5 : 0.01 0.25 0.5 0.75 0.99 ]
* approximate memory usage : 2MB
* setting memory...done
Chromosome 1
block [1] : 200 snps (4ms)
-residualizing and scaling genotypes...done (3ms)
-calc working matrices...done (420ms)
-calc level 0 ridge...done (79ms)
block [2] : 200 snps (2ms)
-residualizing and scaling genotypes...done (1ms)
-calc working matrices...done (439ms)
-calc level 0 ridge...done (79ms)
block [3] : 200 snps (2ms)
-residualizing and scaling genotypes...done (1ms)
-calc working matrices...done (483ms)
-calc level 0 ridge...done (81ms)
block [4] : 200 snps (3ms)
-residualizing and scaling genotypes...done (1ms)
-calc working matrices...done (366ms)
-calc level 0 ridge...done (78ms)
block [5] : 200 snps (2ms)
-residualizing and scaling genotypes...done (1ms)
-calc working matrices...done (485ms)
-calc level 0 ridge...done (78ms)
Level 1 ridge...
-on phenotype 1 (Y1)...done (0ms)
-on phenotype 2 (Y2)...done (0ms)
Output
------
phenotype 1 (Y1) :
0.01 : Rsq = 0.00292408, MSE = 0.995083<- min value
0.25 : Rsq = 0.00619743, MSE = 0.998022
0.5 : Rsq = 0.00679147, MSE = 1.00153
0.75 : Rsq = 0.00753375, MSE = 1.00367
0.99 : Rsq = 0.00733694, MSE = 1.01373
* making predictions...writing LOCO predictions...done (9ms)
phenotype 2 (Y2) :
0.01 : Rsq = 0.012437, MSE = 0.98745<- min value
0.25 : Rsq = 0.00739346, MSE = 0.997094
0.5 : Rsq = 0.00612812, MSE = 1.00169
0.75 : Rsq = 0.00621549, MSE = 1.00343
0.99 : Rsq = 0.0082828, MSE = 1.00621
* making predictions...writing LOCO predictions...done (9ms)
List of blup files written to: [my_output_pred.list]
Elapsed time : 2.66076s
End time: Tue Aug 16 13:24:02 2022
Another sample command:
[user@cn3101]$ regenie \
--bgen example.bgen \
--step 2 \
--bsize 200 \
--threads 1 \
--covarFile covariates.txt \
--phenoFile phenotype_bin_wNA.txt \
--bt --firth --approx \
--pred my_output_pred.list \
--out my_output_step2.txt
Association testing mode with fast multithreading using OpenMP
* bgen : [example.bgen]
-summary : bgen file (v1.2 layout, zlib compressed) with 500 named samples and 1000 variants with 8-bit encoding.
-index bgi file [example.bgen.bgi]
* phenotypes : [phenotype_bin_wNA.txt] n_pheno = 2
-number of phenotyped individuals = 500
* covariates : [covariates.txt] n_cov = 3
-number of individuals with covariate data = 500
* number of individuals used in analysis = 500
* case-control counts for each trait:
- 'Y1': 111 cases and 339 controls
- 'Y2': 115 cases and 385 controls
* LOCO predictions : [my_output_pred.list]
-file [/vf/users/denisovga/regenie/test/my_output_1.loco] for phenotype 'Y1'
-file [/vf/users/denisovga/regenie/test/my_output_2.loco] for phenotype 'Y2'
* # threads : [1]
* block size : [200]
* # blocks : [5]
* approximate memory usage : 2MB
* using minimum MAC of 5 (variants with lower MAC are ignored)
* using fast Firth correction for logistic regression p-values less than 0.05
Chromosome 1 [5 blocks in total]
-reading loco predictions for the chromosome...done (0ms)
-fitting null logistic regression on binary phenotypes...done (1ms)
-fitting null Firth logistic regression on binary phenotypes...done (0ms)
block [1/5] : done (10ms)
block [2/5] : done (8ms)
block [3/5] : done (8ms)
block [4/5] : done (7ms)
block [5/5] : done (8ms)
Association results stored separately for each trait in files :
* [my_output_step2.txt_Y1.regenie]
* [my_output_step2.txt_Y2.regenie]
Number of tests with Firth correction : 108
Number of failed tests : (0/108)
Number of ignored tests due to low MAC : 0
Elapsed time : 0.086111s
End time: Mon Dec 16 15:21:50 2024
End the interactive session:
[user@cn3101 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$