From the breseq documentation:
breseq is a computational pipeline for the analysis of short-read re-sequencing data (e.g. Illumina, 454, IonTorrent, etc.). It uses reference-based alignment approaches to predict mutations in a sample relative to an already sequenced genome. breseq is intended for microbial genomes (<10 Mb) and re-sequenced samples that are only slightly diverged from the reference sequence (<1 mutation per 1000 bp). breseq‘s primary advantages over other software programs are that it can:breseq was initially developed to analyze data from the Lenski long-term evolution experiment with E. coli. However, breseq may be generally useful to researchers who are:
- Accurately predict new sequence junctions, such as those associated with mobile element insertions.
- Integrate multiple sources of evidence for genetic changes into mutation predictions.
- Produce annotated output describing biologically relevant mutational events.
- Tracking mutations over time in microbial evolution experiments.
- Checking strains for unwanted second-site mutations after genetic manipulations.
- Identifying mutations that occur during strain improvement or after long-term culture of engineered strains.
- Discovering what mutations arise in pathogens during infection or cause antibiotic resistance.
BRESEQ_TEST_DATAAllocate an interactive session and run the program. In this sample session we will analyze an E. coli strain that evolved for 20000 generations in the long term evolution experiment.
[user@biowulf]$ sinteractive --mem=5g --cpus-per-task=4 --gres=lscratch:10
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144]$ module load breseq
[user@cn3144]$ cp -L ${BRESEQ_TEST_DATA:-none}/* .
[user@cn3144]$ ls -lh
total 275M
-rw-r--r-- 1 user group 11M Aug 8 13:46 NC_012967.gbk
-rw-r--r-- 1 user group 133M Aug 8 13:46 SRR030257_1.fastq.gz
-rw-r--r-- 1 user group 132M Aug 8 13:46 SRR030257_2.fastq.gz
[user@cn3144]$ breseq -j $SLURM_CPUS_PER_TASK -r NC_012967.gbk \
SRR030257_1.fastq.gz SRR030257_2.fastq.gz
...
---> bowtie2 :: version 2.3.4.1 [/usr/local/apps/bowtie/2-2.3.4.1/bin/bowtie2]
---> R :: version 3.5.0 [/usr/local/apps/R/3.5/3.5.0_build2/bin/R]
+++ NOW PROCESSING Read and reference sequence file input
READ FILE::SRR030257_1
...
[user@cn3144]$ cp -r output data /path/to/where/you/would/like/the/output
[user@cn3144]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf]$
The output directory contains summary html files.
Create a batch input file (e.g. breseq.sh), which uses the input file 'breseq.in'. For example:
#!/bin/bash
module load breseq/0.36.1 || exit 1
wd=$PWD
cd /lscratch/$SLURM_JOB_ID || exit 1
cp -L $BRESEQ_TEST_DATA/* .
breseq -j $SLURM_CPUS_PER_TASK -r NC_012967.gbk \
SRR030257_1.fastq.gz SRR030257_2.fastq.gz
cp -r output $wd
Submit this job using the Slurm sbatch command.
sbatch --gres=lscratch:10 --cpus-per-task=6 --mem=5g breseq.sh
Create a swarmfile (e.g. breseq.swarm). For example:
breseq -j $SLURM_CPUS_PER_TASK -r ref.gbk sample1_reads.fastq.gz breseq -j $SLURM_CPUS_PER_TASK -r ref.gbk sample2_reads.fastq.gz breseq -j $SLURM_CPUS_PER_TASK -r ref.gbk sample3_reads_R1.fastq.gz sample3_reads_R2.fastq.gz
Submit this job using the swarm command.
swarm -f breseq.swarm -g 5 -t 4 --module breseq/0.33.0where
| -g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
| -t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
| --module breseq | Loads the breseq module for each subjob in the swarm |