Fasta34 All-All Comparison Script Documentation

Note: If you are looking for documentation on how you could do fasta_all_all comparison and parsing on IU's IBM SP, then please visit a related webpage.

This is the documentation for my perl script: which performs a all-all sequence comparison for a given input sequence file and the corresponding parser.

You can grab the script on DNA at: /home/agopu/research/ext_motif/

Parser: You can also use my fasta-all-all parser if you wish. The script (on DNA again) is: /home/agopu/research/ext_motif/


Alternately for those of you who don't have accounts on DNA, you can click the links below to get the script in txt format. Just copy and paste it into a perl script (say and try running it:

Fasta All-All Generation: Source Code for

Result Parser: Source Code for

It's a good idea (though not necessary) to make the perl script, you create, executable.

ag@dna~/research/ext_motif/tmp% ls -l
total 4
-rw-------    1 agopu    agopu        3688 Oct 27 13:23
ag@dna~/research/ext_motif/tmp% chmod 700 *.pl
ag@dna~/research/ext_motif/tmp% ls -l
total 4
-rwx------    1 agopu    agopu        3688 Oct 27 13:23

Following that, create a sample sequence file if you don't have one. Copy and paste the contents of this file using your favorite editor or save it using your web-browser: Sample Sequence File (COG0001)

Now you are ready to try using the fasta script.

ag@dna~/research/ext_motif/tmp% perl -i sample_sequence
 Begin ALL-ALL comparisons using FASTA34 ...
 Query file not defined ... Using default (input database file): sample_sequence
 Output file not defined ... Using default: sample_sequence.fasta_all_all
 ... DONE doing ALL-ALL comparisons. Exiting ....

You can check out what the output looks like, by opening the sample_sequence.fasta_all_all unless you specified a different output filename. Just a little bit of information about the output:
-- It's pretty much a dump of FASTA34 results
-- Each query sequence starts with a string "QUERY_BEGIN" and ends with a "QUERY_END". This might be useful when you try to parse the output file for useful information.

ag@dna~/research/ext_motif/tmp% ls -l
total 36
-rwx------    1 agopu    agopu        3688 Oct 27 13:23
-rw-------    1 agopu    agopu        1285 Oct 27 13:24 sample_sequence
-rw-------    1 agopu    agopu       27726 Oct 27 13:24 sample_sequence.fasta_all_all

ag@dna~/research/ext_motif/tmp% less sample_sequence.fasta_all_all
 FASTA searches a protein or DNA sequence data bank
 version 3.4t05 Aug 18, 2001
Please cite:
 W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448
sample_sequence.faall_query_temp: 418 aa
 vs  sample_sequence library
searching sample_sequence library

[ ... snip ...]
>>AF1241                                                  (418 aa)
 initn: 2721 init1: 2721 opt: 2721  Z-score: 8496.4  bits: 1581.1 E():    0
Smith-Waterman score: 2721;  100.000% identity (100.000% ungapped) in 418 aa overlap (1-418:1-418)

[ ... snip ...]

Sample run of the parser:

ag@dna~/research/ext_motif/tmp% -i sample_sequence.fasta_all_all -s cog -c 200
 Parsing FASTA34 output and creating BAG-formatted input ...
 ... DONE creating BAG input with proper formatting. Exiting now ...

This is what you can expect to see if you run my parser:

ag@dna~/research/ext_motif/tmp% less sample_sequence.fasta_all_all.bagin
AF1241  MJ0603  4739.7  0       1533    54.048  420     1,416   -       22,439
AF1241  MTH228  4622.7  0       1496    54.436  417     5,418   -       2,415
AF1241  Ta0571  4221.0  0       1369    47.470  415     6,418   -       8,421

[ ... snip ...]
BS_gsaB BH0943  6466.3  0       2079    80.620  387     1,387   -       39,425
BS_gsaB aq_816  4135.7  0       1342    48.969  388     3,390   -       37,424

[ ... snip ...]


Arvind Gopu (agopu [at] cs [dot] indiana [dot] edu)
Last Modified: Tue Apr 20 00:14:38 EST 2004