ESPRESSO is a novel method for processing alignment of long read RNA-seq data,
which can effectively improve splice junction accuracy and isoform quantification.
ESPRESSO jointly considers alignments of all long reads aligned to a gene
and uses error profiles of individual reads
to improve the identification of splice junctions
and the discovery of their corresponding transcript isoforms.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive
[user@cn3144 ~]$ module load espresso
[+] Loading singularity 4.0.1 on cn3144
[+] Loading espresso 1.4.0
[user@cn3144 ~]$ ESPRESSO_C.pl -h
Program: ESPRESSO (Error Statistics PRomoted Evaluator of Splice Site Options)
Version: C_1.4.0
Contact: Yuan Gao <gaoy@email.chop.edu, gy.james@163.com>
Usage: perl ESPRESSO_C.pl -I work_dir -F ref.fa -X target_ID
Arguments:
-I, --in
work directory (generated by ESPRESSO_S)
-F, --fa
FASTA file of all reference sequences. Please make sure this file is
the same one provided to mapper. (required)
-X, --target_ID
ID of sample to process (required)
-H, --help
show this help information
-T, --num_thread
thread number (default: 5)
--sort_buffer_size
memory buffer size for running 'sort' commands (default: 2G)
[user@cn3144 ~]$ ESPRESSO_Q.pl -h
Program: ESPRESSO (Error Statistics PRomoted Evaluator of Splice Site Options)
Version: Q_1.4.0
Contact: Yuan Gao <gaoy@email.chop.edu, gy.james@163.com>
Usage: perl ESPRESSO_Q.pl -L work_dir/samples.tsv.updated -A anno.gtf
Arguments:
-L, --list_samples
tsv list of multiple samples (each bam in a line with 1st column as
sorted bam file, 2nd column as sample name in output, 3rd column as
directory of ESPRESSO_C results; this list can be generated by
ESPRESSO_S according to the initially provided tsv list; required)
-A, --anno
input annotation file in GTF format (optional)
-O, --out_dir
output directory (default: directory of -L)
-V, --tsv_compt
output tsv for compatible isoform(s) of each read (optional)
-T --num_thread
how many threads to use (default: 5)
-H, --help
show this help information
-N, --read_num_cutoff
min perfect read count for all splice junctions of novel isoform
(default: 2)
-R, --read_ratio_cutoff
min perfect read ratio for all splice junctions of novel isoform
(default: 0)
-S, --SJ_dist
max number of bases that an alignment endpoint can extend past the
start or end of a matched isoform
(default: 35)
--internal_boundary_limit
max number of bases that an alignment endpoint can extend into an
intron of a matched isoform
(default: 6)
--allow_longer_terminal_exons
allow an alignment to match an isoform even if the alignment endpoint
extends more than --SJ_dist past the start or end
[user@cn3144 ~]$ ESPRESSO_S.pl -h
Program: ESPRESSO (Error Statistics PRomoted Evaluator of Splice Site Options)
Version: S_1.4.0
Contact: Yuan Gao <gaoy@email.chop.edu, gy.james@163.com>
Usage: perl ESPRESSO_S.pl -L samples.tsv -F ref.fa -A anno.gtf -O work_dir
Arguments:
-L, --list_samples
tsv list of sample(s) (each file in a line with 1st column as sorted
BAM/SAM file and 2nd column as sample name; required)
-F, --fa
FASTA file of all reference sequences. Please make sure this file is
the same one provided to mapper. (required)
-A, --anno
input annotation file in GTF format (optional)
-B, --SJ_bed
input custom reliable splice junctions in BED format (optional; each
reliable SJ in one line, with the 1st column as chromosome, the 2nd
column as upstream splice site 0-base coordinate, the 3rd column as
downstream splice site and 6th column as strand)
-O, --out
work directory (existing files in this directory may be OVERWRITTEN;
default: ./)
-H, --help
show this help information
-N, --read_num_cutoff
min perfect read count for denovo detected candidate splice junctions
(default: 2)
-R, --read_ratio_cutoff
min perfect read ratio for denovo detected candidate splice junctions:
Set this as 1 for completely GTF-dependent processing (default: 0)
-C, --cont_del_max
max continuous deletion allowed; intron will be identified if longer
(default: 50)
-M, --chrM
tell ESPRESSO the ID of mitochondrion in reference file (default:
chrM)
-T, --num_thread
thread number (default: minimum of 5 and sam file number)
-Q, --mapq_cutoff
min mapping quality for processing (default: 1)
--sort_buffer_size
memory buffer size for running 'sort' commands (default: 2G)
[user@cn3144 ~]$ git clone https://github.com/Xinglab/espresso
[user@cn3144 ~]$ cd espresso
[user@cn3144 ~]$ python-espresso tests/high_confidence_sjs/test.py
test (__main__.HighConfidenceSjsTest.test) ... (config=strict)
(config=num)
(config=ratio)
(config=num_and_ratio)
(config=gtf)
(config=bed)
(config=bed_and_num_and_ratio)
ok
----------------------------------------------------------------------
Ran 1 test in 230.805s
[user@cn3144 ~]$ python-espresso tests/alignments/test.py
test (__main__.ChrNameMismatchTest.test) ... ok
test (__main__.CigarFormatTest.test) ... ok
test (__main__.MissingSequenceTest.test) ... ok
test (__main__.SecondaryAlignmentTest.test) ... ok
----------------------------------------------------------------------
Ran 4 tests in 90.046s
[user@cn3144 ~]$ python-espresso tests/isoform_assignment/test.py
test (__main__.IsoformAssignmentTest.test) ... ok
test (__main__.NoExternalBoundaryTest.test) ... ok
test (__main__.ReadEndpointsTest.test) ... ok
----------------------------------------------------------------------
Ran 3 tests in 140.279s
[user@cn3111 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$