bismark on Biowulf

Bismark is a program to map bisulfite treated sequencing reads to a genome of interest and perform methylation calls in a single step. The output can be easily imported into a genome viewer, such as SeqMonk, and enables a researcher to analyse the methylation levels of their samples straight away. It's main features are:

References:

Documentation
Bismark is extensively documented. To read the help doc, type bismark --help. See also:
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

The first step in using Bismark is to prepare a genome using the bismark_genome_preparation command. This creates a subdirectory in the fasta file directory called Bisulfite_Genome. To prevent unnecessary data duplication, the human genome (hg19 and hg38) have been indexed for use with the bowtie2 aligner. These are in /fdb/bismark.

A fastq file with test data is provided by the authors. In the following example, the user copies this testfile along with X and Y chromosome data from the human genome to their local space and runs though the basic analysis steps. (User input in bold)

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=12g --gres
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job


[user@cn3144 ~]$ bismark --help
DESCRIPTION

The following is a brief description of command line options and arguments
to control the Bismark bisulfite mapper and methylation caller. Bismark
takes in FastA or FastQ files and aligns the reads to a specified bisulfite
genome. Sequence reads are transformed into a bisulfite converted forward
strand version (C->T conversion) or into a bisulfite treated reverse strand
(G->A conversion of the forward strand). Each of these reads are then aligned
to bisulfite treated forward strand index of a reference genome (C->T
converted) and a bisulfite treated reverse strand index of the genome (G->A
conversion of the forward strand, by doing this alignments will produce the
same positions). These 4 instances of Bowtie 2 or HISAT2 are run in parallel.
The sequence file(s) are then read in again sequence by sequence to pull out
the original sequence from the genome and determine if there were any
protected C's present or not.

...

Full output from the help command

[user@cn3144 ~]$ mkdir -p /data/$USER/bismark_test/XandY
[user@cn3144 ~]$ cd /data/$USER/bismark_test
[user@cn3144 bismark_test]$ cp /fdb/genome/human-feb2009/chrX.fa ./XandY
[user@cn3144 bismark_test]$ cp /fdb/genome/human-feb2009/chrY.fa ./XandY
[user@cn3144 bismark_test]$ cp $BISMARK_HOME/test_data.fastq .
[user@cn3144 bismark_test]$ bismark_genome_preparation XandY
Writing bisulfite genomes out into a single MFA (multi FastA) file
Bisulfite Genome Indexer version v0.25.0 (last modified: 19 May 2022)
Step I - Prepare genome folders - completed
Step II - Genome bisulfite conversions - completed
Total number of conversions performed:
[....]
Full output from the prep command

[user@cn3144 ~]$ bismark XandY test_data.fastq
Bowtie 2 seems to be working fine (tested command 'bowtie2 --version' [2.5.3])
Output format is BAM (default)
Alignments will be written out in BAM format. Samtools found here: '/usr/local/apps/samtools/1.21/bin/samtools'
Reference genome folder provided is XandY/      (absolute path is '/data/$USER/bismark_test/XandY/)'
FastQ format assumed (by default)
Input files to be analysed (in current folder '/data/$USER/bismark_test'):
test_data.fastq
Library is assumed to be strand-specific (directional), alignments to strands complementary to the original top or bottom strands will be ignored (i.e. not performed!)
...
Full output from the run command

[user@cn3144 bismark_test]$ bismark_methylation_extractor test_data_bismark_bt2.bam
 *** Bismark methylation extractor version v0.25.0 ***

Trying to determine the type of mapping from the SAM header line of file test_data_bismark_bt2.bam
Treating file(s) as single-end data (as extracted from @PG line)
...

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Full output from the extract command

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. bismark.sh). For example:

#!/bin/bash
# this file is called bismark.sh
set -e

module load bismark
cd /data/$USER/bismark_test
bismark_genome_preparation XandY
bismark XandY test_data.fastq
bismark_methylation_extractor test_data_bismark_bt2.bam

Submit this job using the Slurm sbatch command.

sbatch bismark.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.
One strategy for running a swarm of bismark jobs would be to set up multiple directories containing gene sequences. Once you have done that, you can set up a swarm command file containing one line for each of your bismark runs.

Sample swarm command file

# --------file myjobs.swarm----------
bismark directory1 test_data.fastq
bismark directory2 test_data.fastq
bismark directory3 test_data.fastq
....
bismark directoryN test_data.fastq
# -----------------------------------

Submit this set of runs to the batch system by typing

[user@biowulf ~]$ swarm --module bismark -f myjobs.swarm