SpliceAI is a deep neural network that accurately predicts splice junctions from an arbitrary pre-mRNA transcript sequence, enabling precise prediction of noncoding genetic variants that cause cryptic splicing.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=8g --gres=gpu:v100:1
[user@cn4466 ~]$ module load SpliceAI
+] Loading python 3.6 ...
[+] Loading cuDNN 7.0 libraries...
[+] Loading CUDA Toolkit 9.0.176 ...
[user@cn4466 ~]$ spliceai -h
...
usage: spliceai [-h] [-I [input]] [-O [output]] -R reference -A annotation
[-D [distance]] [-M [mask]]
Version: 1.3
optional arguments:
-h, --help show this help message and exit
-I [input] path to the input VCF file, defaults to standard in
-O [output] path to the output VCF file, defaults to standard out
-R reference path to the reference genome fasta file
-A annotation "grch37" (GENCODE V24lift37 canonical annotation file in
package), "grch38" (GENCODE V24 canonical annotation file in
package), or path to a similar custom gene annotation file
-D [distance] maximum distance between the variant and gained/lost splice
site, defaults to 50
-M [mask] mask scores representing annotated acceptor/donor gain and
unannotated acceptor/donor loss, defaults to 0
Download sample data:
[user@cn4466 ~]$ cp $SPLICEAI_DATA/* .Specify a reference sequence:
[user@cn4466 ~]$ ln -s /fdb/igenomes/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa hg38.faRun the spliceai executable on the sample data:
[user@cn4466 ~]$ spliceai -I input.vcf -O output.vcf -R hg38.fa -A grch37 & ... [user@cn4466 ~]$ nvidia-smi ... +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 580.126.09 Driver Version: 580.126.09 CUDA Version: 13.0 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 Tesla V100-PCIE-16GB On | 00000000:8E:00.0 Off | 0 | | N/A 27C P0 24W / 250W | 0MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 45210 C /usr/bin/python 11616MiB | +-----------------------------------------------------------------------------+ ... WARNING:root:Skipping record (ref issue): 2 152389953 . T A,C,G . . . WARNING:root:Skipping record (ref issue): 2 179415988 . C CA . . . WARNING:root:Skipping record (ref issue): 2 179446218 . ATACT A . . . WARNING:root:Skipping record (ref issue): 2 179446218 . ATACT AT,ATA . . . WARNING:root:Skipping record (ref issue): 2 179642185 . G A . . . WARNING:root:Skipping record (ref issue): 19 38958362 . C T . . . WARNING:root:Skipping record (ref issue): 21 47406854 . CCA C . . . WARNING:root:Skipping record (ref issue): 21 47406856 . A AT . . . WARNING:root:Skipping record (ref issue): X 129274636 . A C,G,T . . .An output file output.vcf will be produced:
[user@cn4466 ~]$ cat output.vcf ... #CHROM POS ID REF ALT QUAL FILTER INFO 1 25000 . A C,G,T . . . 2 152389953 . T A,C,G . . . 2 179415988 . C CA . . . 2 179446218 . ATACT A . . . 2 179446218 . ATACT AT,ATA . . . 2 179642185 . G A . . . 19 38958362 . C T . . . 21 47406854 . CCA C . . . 21 47406856 . A AT . . . X 129274636 . A C,G,T . . .Exit the application:
[user@cn4466 ~]$ exit [user@biowulf ~]$
Create a batch input file (e.g. spliceai.sh). For example:
#!/bin/bash module load SpliceAI cp $SPLICEAI_DATA/* . spliceai -I input.vcf -O output.vcf -R /fdb/igenomes/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa -A grch37
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] spliceai.sh