This tool is developed based on FuSeq, the method for detecting fusion genes from RNA-seq data. A subsampling study of the prostate data suggests that a coverage of at least 75x is necessary to achieve high accuracy.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=4g --gres=lscratch:10 [user@cn3144 ~]$ module load fuseq-wes [+] Loading python 3.8 ... [+] Loading gcc 9.2.0 ... [-] Unloading gcc 9.2.0 ... [+] Loading gcc 9.2.0 ... [+] Loading openmpi 4.0.5 for GCC 9.2.0 [+] Loading ImageMagick 7.0.8 on cn4313 [+] Loading HDF5 1.10.4 [-] Unloading gcc 9.2.0 ... [+] Loading gcc 9.2.0 ... [+] Loading NetCDF 4.7.4_gcc9.2.0 [+] Loading pandoc 2.17.1.1 on cn4313 [+] Loading pcre2 10.21 ... [+] Loading R 4.2.0 [+] Loading fuseq-wes 1.0.0Create soft links to the sample read data:
[user@cn3144 ]$ cp -r $FUSEQ_WES_TEST_DATA/* . [user@cn3144 ]$ bamfile="FuSeq_WES_testdata/test.bam" [user@cn3144 ]$ ref_json="$FUSEQ_WES_REF/UCSC_hg19_wes_contigSize3000_bigLen130000_r100/\ UCSC_hg19_wes_contigSize3000_bigLen130000_r100.json" [user@cn3144 ]$ gtfSqlite="$FUSEQ_WES_REF/UCSC_hg19_wes_contigSize3000_bigLen130000_r100/\ UCSC_hg19_wes_contigSize3000_bigLen130000_r100.sqlite" [user@cn3144 ]$ output_dir="test_out" [user@cn3144 ]$ mkdir $output_dir#extract mapped reads and split reads
[user@cn3144 ]$ python3 $FUSEQ_WES/FuSeq_WES_v1.0.0/fuseq_wes.py \
--bam $bamfile \
--gtf $ref_json \
--mapq-filter \
--outdir $output_dir
#process the reads
[user@cn3144 ]$ fusiondbFn="$FUSEQ_WES/FuSeq_WES_v1.0.0/Data/Mitelman_fusiondb.RData"
[user@cn3144 ]$ paralogdb="$FUSEQ_WES/FuSeq_WES_v1.0.0/Data/ensmbl_paralogs_grch37.RData"
[user@cn3144 ]$ Rscript $FUSEQ_WES/FuSeq_WES_v1.0.0/process_fuseq_wes.R \
in=$output_dir \
sqlite=$gtfSqlite \
fusiondb=$fusiondbFn \
paralogdb=$paralogdbFn \
out=$output_dir
Create a batch input file (e.g. fuseq-wes.sh). For example:
#! /bin/bash
module load fuseq-wes
set -e
cp -r $FUSEQ_WES_TEST_DATA/* .
bamfile="FuSeq_WES_testdata/test.bam"
ref_json="$FUSEQ_WES_REF/UCSC_hg19_wes_contigSize3000_bigLen130000_r100/\
UCSC_hg19_wes_contigSize3000_bigLen130000_r100.json"
gtfSqlite="$FUSEQ_WES_REF/UCSC_hg19_wes_contigSize3000_bigLen130000_r100/\
UCSC_hg19_wes_contigSize3000_bigLen130000_r100.sqlite"
output_dir="test_out"
mkdir -p $output_dir
python3 $FUSEQ_WES/FuSeq_WES_v1.0.0/fuseq_wes.py \
--bam $bamfile \
--gtf $ref_json \
--mapq-filter \
--outdir $output_dir
Submit this job using the Slurm sbatch command.
sbatch -c 2 --mem=4g --time=8:00:00 fuseq-wes.sh
The master process submitting jobs should be run either as a batch job or on an interactive node - not on the biowulf login node.