This is the documentation for my perl script: fasta_all_all.pl which performs a all-all sequence comparison for a given input sequence file and the corresponding parser.
You can grab the script on DNA at: /home/agopu/research/ext_motif/fasta_all_all.pl
Parser: You can also use my fasta-all-all parser if you wish. The script (on DNA again) is: /home/agopu/research/ext_motif/fasta_all_all.pl
OR
Alternately for those of you who don't have accounts on DNA, you can click the links below to get the script in txt format. Just copy and paste it into a perl script (say fasta_all_all.pl/f2b.pl) and try running it:
Fasta All-All Generation: Source Code for fasta_all_all.pl
Result Parser: Source Code for f2b.pl
ag@dna~/research/ext_motif/tmp% ls -l total 4 -rw------- 1 agopu agopu 3688 Oct 27 13:23 fasta_all_all.pl ag@dna~/research/ext_motif/tmp% chmod 700 *.pl ag@dna~/research/ext_motif/tmp% ls -l total 4 -rwx------ 1 agopu agopu 3688 Oct 27 13:23 fasta_all_all.pl
Following that, create a sample sequence file if you don't have one. Copy and paste the contents of this file using your favorite editor or save it using your web-browser: Sample Sequence File (COG0001)
Now you are ready to try using the fasta script.
ag@dna~/research/ext_motif/tmp% perl fasta_all_all.pl -i sample_sequence Begin ALL-ALL comparisons using FASTA34 ... Query file not defined ... Using default (input database file): sample_sequence Output file not defined ... Using default: sample_sequence.fasta_all_all ... DONE doing ALL-ALL comparisons. Exiting ....
ag@dna~/research/ext_motif/tmp% ls -l total 36 -rwx------ 1 agopu agopu 3688 Oct 27 13:23 fasta_all_all.pl -rw------- 1 agopu agopu 1285 Oct 27 13:24 sample_sequence -rw------- 1 agopu agopu 27726 Oct 27 13:24 sample_sequence.fasta_all_all ag@dna~/research/ext_motif/tmp% less sample_sequence.fasta_all_all QUERY_BEGIN:>AF1241 MKLDKSRKLYAEALNLMPGGVSSPVRAFKPHPFYTARGKGSKIYDVDGNAYIDYCMAYGPLVLGHANEVV KNALAEQLERGWLYGTPIELEIEYAKLIQKYFPSMEMLRFVNTGSEATMAALRVARGFTGRDKIVKVEGS FHGAHDAVLVKAGSGATTHGIPNSAGVPADFVKNTLQVPFNDIEALSEILEKNEVAALILEPVMGNSSLI LPEKDYLKEVRKVTAENDVLLIFDEVITGFRVSMGGAQEYYGVKPDLTTLGKIAGGGLPIGIFGGRKEIM ERVAPSGDVYQAGTFSGNPLSLTAGYATVKFMEENGVIEKVNSLTEKLVSGIADVLEDKKAECEVGSLAS MFCIYFGPTPRNYAEALQLNKERFMEFFWRMLENGVFLPPSQYETCFVSFAHTEEDVEKTVEAVSESL FASTA searches a protein or DNA sequence data bank version 3.4t05 Aug 18, 2001 Please cite: W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448 sample_sequence.faall_query_temp: 418 aa >AF1241 vs sample_sequence library searching sample_sequence library [ ... snip ...] >>AF1241 (418 aa) initn: 2721 init1: 2721 opt: 2721 Z-score: 8496.4 bits: 1581.1 E(): 0 Smith-Waterman score: 2721; 100.000% identity (100.000% ungapped) in 418 aa overlap (1-418:1-418) [ ... snip ...]
Sample run of the parser:
ag@dna~/research/ext_motif/tmp% f2b.pl -i sample_sequence.fasta_all_all -s cog -c 200 Parsing FASTA34 output and creating BAG-formatted input ... QUERY COUNT: 35 SAME COUNT: 35 and DIFFERENT COUNT 1190 ... DONE creating BAG input with proper formatting. Exiting now ...
This is what you can expect to see if you run my parser:
ag@dna~/research/ext_motif/tmp% less sample_sequence.fasta_all_all.bagin AF1241 MJ0603 4739.7 0 1533 54.048 420 1,416 - 22,439 AF1241 MTH228 4622.7 0 1496 54.436 417 5,418 - 2,415 AF1241 Ta0571 4221.0 0 1369 47.470 415 6,418 - 8,421 [ ... snip ...] BS_gsaB BH0943 6466.3 0 2079 80.620 387 1,387 - 39,425 BS_gsaB aq_816 4135.7 0 1342 48.969 388 3,390 - 37,424 [ ... snip ...]