Arvind Gopu's home on the web!

Simple Reference to Using Clustalw

Simple Multiple Sequence Alignments (MSAs)

If you want compute MSA of a sequence file called sequence_file, then this is how you'd go about it:

  $ clustalw sequence_file -output=gde -outfile=alignment.gde > stdout 2>&1

The -output parameter is used to specify output format. The outfile parameter enables you to have a different output filename. Both these parameters are optional -- they default to .ALN format and sequence_file.aln respectively.

Profile Alignments

Clustalw also lets you to align two existing MSA profiles (like the one created in the previous section). Here's how you can align profiles:

  $ clustalw -profile1=alignment1.gde -profile2=alignment2.gde -output=gde -outfile=combined_profile.gde > stdout 2>&1

The -output and outfile parameters are used for the same purpose as in the above case.

Phylogenetic Trees

Finally Clustalw can be used to generate phylogenetic trees out of MSA profiles. Here's how you can do that:

  $   $clustalw some_profile.gde -tree -outputtree=ph > stdout 2>&1

The -tree indicated you are trying to generate a tree. The -outputtree parameter is used to specify the type of tree you'd like to generate. Possible types are {ph, nj, dst}. See clustalw help index (link below) for more detail.

More information

The clustalw help index link off this page is an excellent source of information. Also, you could try clustalw interactively and figure out command line options -- you'll need to use intuition for this :-).

The usual caveat about bioinformatics tools -- try to have a file name which is more than one character long for your sequence file. I've also experienced Clustalw screwing up with paths (absolute, reference, it doesn't matter!). If it claims it can't find the input seq file or if the input file is empty, try using absolute/relative path; might solve the problem!