Clustal W is a general purpose multiple alignment program for DNA or proteins. The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. It is designed to be run interactively, or to assign options via the command line. Clustal Omega is a new development to the Clustal family, which offers a significant increase in scalability over previous versions, allowing hundreds of thousands of sequences to be aligned in only a few hours. It will also make use of multiple processors, where present. In addition, the quality of alignments is superior to previous versions, as measured by a range of popular benchmarks. ClustalW is no longer being maintained or updated by its developers. ClustalO is actively being maintained. ClustalO can also run in multi-threaded mode, which may make your sequence alignments faster.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --cpus-per-task=8
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144 ~]$ module load clustalo
[+] Loading clustalo 1.2.4 ...
[user@cn3144 ~]$ clustalo -i globins630.fa -o clustalo.out --threads=$SLURM_CPUS_PER_TASK
[user@cn3144 ~]$ module load clustalw
[user@cn3144 ~]$ clustalw
**************************************************************
******** CLUSTAL W (1.83) Multiple Sequence Alignments ********
**************************************************************
1. Sequence Input From Disc
2. Multiple Alignments
3. Profile / Structure Alignments
4. Phylogenetic trees
S. Execute a system command
H. HELP
X. EXIT (leave program)
Your choice: 1
Sequences should all be in 1 file.
7 formats accepted:
NBRF/PIR, EMBL/SwissProt, Pearson (Fasta), GDE, Clustal, GCG/MSF, RSF.
Enter the name of the sequence file: seqs.inp
Sequence format is Pearson
Sequences assumed to be PROTEIN
Sequence 1: chiins 110 aa
Sequence 2: xenins 110 aa
Sequence 3: humins 110 aa
Sequence 4: monins 110 aa
Sequence 5: dogins 110 aa
Sequence 6: hamins 110 aa
Sequence 7: bovins 110 aa
Sequence 8: guiins 110 aa
**************************************************************
******** CLUSTAL W (1.83) Multiple Sequence Alignments ********
**************************************************************
1. Sequence Input From Disc
2. Multiple Alignments
3. Profile / Structure Alignments
4. Phylogenetic trees
S. Execute a system command
H. HELP
X. EXIT (leave program)
Your choice: 2
****** MULTIPLE ALIGNMENT MENU ******
1. Do complete multiple alignment now (Slow/Accurate)
2. Produce guide tree file only
3. Do alignment using old guide tree file
4. Toggle Slow/Fast pairwise alignments = SLOW
5. Pairwise alignment parameters
6. Multiple alignment parameters
7. Reset gaps before alignment? = OFF
8. Toggle screen display = ON
9. Output format options
S. Execute a system command
H. HELP
or press [RETURN] to go back to main menu
Your choice: 1
Enter a name for the CLUSTAL output file [seqs.aln]: myseqs.aln
Enter name for new GUIDE TREE file [seqs.dnd]:
Start of Pairwise alignments
Aligning...
Sequences (1:2) Aligned. Score: 66
Sequences (1:3) Aligned. Score: 63
[...]
Sequences (7:8) Aligned. Score: 61
Guide tree file created: [seqs.dnd]
Start of Multiple Alignment
There are 7 groups
Aligning...
Group 1: Sequences: 2 Score:2138
Group 2: Sequences: 2 Score:2373
Group 3: Sequences: 4 Score:2157
Group 4: Sequences: 5 Score:2146
Group 5: Sequences: 6 Score:1971
Group 6: Sequences: 2 Score:1972
Group 7: Sequences: 8 Score:1739
Alignment Score 13222
Consensus length = 110
CLUSTAL-Alignment file created [seqs.aln]
CLUSTAL W (1.83) multiple sequence alignment
dogins MALWMRLLPLLALLALWAPAPTRAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREVED
bovins MALWTRLRPLLALLALWPPPPARAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREVEG
humins BALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED
monins BALWMRLLPLLALLALWGPDPVPAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED
hamins MTLWMRLLPLLTLLVLWEPNPAQAFVNQHLCGSHLVEALYLVCGERGFFYTPKSRRGVED
guiins MALWMHLLTVLALLALWGPNTGQAFVSRHLCGSNLVETLYSVCQDDGFFYIPKDRRELED
chiins BALWIRSLPLLALLVFSGPGTSYAAANQHLCGSHLVEALYLVCGERGFFYSPKARRDVEQ
xenins BALWMQCLPLVLVLFFSTPNTE-ALVNQHLCGSHLVEALYLVCGDRGFFYYPKVKRDMEQ
:** : .:: :* : * . * ..:*****:***:** ** : **** ** :* *
dogins LQVRDVELAGAPGEGGLQPLALEGALQKRGIVEQCCTSICSLYQLENYCN
bovins PQVGALELAGGPGAGG-----LEGPPQKRGIVEQCCASVCSLYQLENYCN
humins LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
monins PQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
hamins PQVAQLELGGGPGADDLQTLALEVAQQKRGIVDQCCTSICSLYQLENYCN
guiins PQVEQTELGMGLGAGGLQPLALEMALQKRGIVDQCCTGTCTRHQLQSYCN
chiins PLVSS---PLRGEAGVLPFQQEEYEKVKRGIVEQCCHNTCSLYQLENYCN
xenins ALVSG---PQDNELDGMQLQPQEYQKMKRGIVEQCCHSTCSLFQLESYCN
* . * *****:*** . *: .**:.***
Press [RETURN] to continue or X to stop: X
**************************************************************
******** CLUSTAL W (1.83) Multiple Sequence Alignments ********
**************************************************************
1. Sequence Input From Disc
2. Multiple Alignments
3. Profile / Structure Alignments
4. Phylogenetic trees
S. Execute a system command
H. HELP
X. EXIT (leave program)
Your choice: x
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
Create a batch input file (e.g. clustal.sh). For example, the following script runs ClustalW and ClustalO on the same input file.
#!/bin/bash module load clustalw clustalw -INFILE=test1.fa -ALIGN module load clustalo clustalo -i test1.fa -o test1.out --threads=$SLURM_CPUS_PER_TASK
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=8 [--mem=#] clustalw.sh
Create a swarmfile (e.g. clustalw.swarm). For example:
clustalo -i test1.fa -o test1.out --threads=$SLURM_CPUS_PER_TASK clustalo -i test2.fa -o test2.out --threads=$SLURM_CPUS_PER_TASK clustalo -i test3.fa -o test3.out --threads=$SLURM_CPUS_PER_TASK [...]
Submit this job using the swarm command.
swarm -f clustalw.swarm -t 8 [-g #] --module clustalwwhere
| -g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
| -t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
| --module clustalw | Loads the clustalw module for each subjob in the swarm |