Bismark is a program to map bisulfite treated sequencing reads to a genome of interest and perform methylation calls in a single step. The output can be easily imported into a genome viewer, such as SeqMonk, and enables a researcher to analyse the methylation levels of their samples straight away. It's main features are:
The first step in using Bismark is to prepare a genome using the bismark_genome_preparation command. This creates a subdirectory in the fasta file directory called Bisulfite_Genome. To prevent unnecessary data duplication, the human genome (hg19 and hg38) have been indexed for use with the bowtie2 aligner. These are in /fdb/bismark.
A fastq file with test data is provided by the authors. In the following example, the user copies this testfile along with X and Y chromosome data from the human genome to their local space and runs though the basic analysis steps. (User input in bold)
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=12g --gres salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ bismark --help DESCRIPTION The following is a brief description of command line options and arguments to control the Bismark bisulfite mapper and methylation caller. Bismark takes in FastA or FastQ files and aligns the reads to a specified bisulfite genome. Sequence reads are transformed into a bisulfite converted forward strand version (C->T conversion) or into a bisulfite treated reverse strand (G->A conversion of the forward strand). Each of these reads are then aligned to bisulfite treated forward strand index of a reference genome (C->T converted) and a bisulfite treated reverse strand index of the genome (G->A conversion of the forward strand, by doing this alignments will produce the same positions). These 4 instances of Bowtie 2 or HISAT2 are run in parallel. The sequence file(s) are then read in again sequence by sequence to pull out the original sequence from the genome and determine if there were any protected C's present or not. ...Full output from the help command
[user@cn3144 ~]$ mkdir -p /data/$USER/bismark_test/XandY [user@cn3144 ~]$ cd /data/$USER/bismark_test [user@cn3144 bismark_test]$ cp /fdb/genome/human-feb2009/chrX.fa ./XandY [user@cn3144 bismark_test]$ cp /fdb/genome/human-feb2009/chrY.fa ./XandY [user@cn3144 bismark_test]$ cp $BISMARK_HOME/test_data.fastq . [user@cn3144 bismark_test]$ bismark_genome_preparation XandY Writing bisulfite genomes out into a single MFA (multi FastA) file Bisulfite Genome Indexer version v0.25.0 (last modified: 19 May 2022) Step I - Prepare genome folders - completed Step II - Genome bisulfite conversions - completed Total number of conversions performed: [....]Full output from the prep command
[user@cn3144 ~]$ bismark XandY test_data.fastq Bowtie 2 seems to be working fine (tested command 'bowtie2 --version' [2.5.3]) Output format is BAM (default) Alignments will be written out in BAM format. Samtools found here: '/usr/local/apps/samtools/1.21/bin/samtools' Reference genome folder provided is XandY/ (absolute path is '/data/$USER/bismark_test/XandY/)' FastQ format assumed (by default) Input files to be analysed (in current folder '/data/$USER/bismark_test'): test_data.fastq Library is assumed to be strand-specific (directional), alignments to strands complementary to the original top or bottom strands will be ignored (i.e. not performed!) ...Full output from the run command
[user@cn3144 bismark_test]$ bismark_methylation_extractor test_data_bismark_bt2.bam *** Bismark methylation extractor version v0.25.0 *** Trying to determine the type of mapping from the SAM header line of file test_data_bismark_bt2.bam Treating file(s) as single-end data (as extracted from @PG line) ... [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Full output from the extract command
Create a batch input file (e.g. bismark.sh). For example:
#!/bin/bash # this file is called bismark.sh set -e module load bismark cd /data/$USER/bismark_test bismark_genome_preparation XandY bismark XandY test_data.fastq bismark_methylation_extractor test_data_bismark_bt2.bam
Submit this job using the Slurm sbatch command.
sbatch bismark.sh
Sample swarm command file
# --------file myjobs.swarm---------- bismark directory1 test_data.fastq bismark directory2 test_data.fastq bismark directory3 test_data.fastq .... bismark directoryN test_data.fastq # -----------------------------------
Submit this set of runs to the batch system by typing
[user@biowulf ~]$ swarm --module bismark -f myjobs.swarm