Computing Fasta all-all comparison on IU's IBM SP


If you are looking for information about doing Fasta34 all-all comparison on Informatics servers -- how you could use my all-all script or my fasta_output parser, then please visit a related webpage.


Running Fasta34 using the `fastajob' script

The IBM SP has a fastajob script (Dick Repasky's) which automates parallelizing fasta34 (which is in fact trivially parallelizable) on your input file of query seqeunces.

Here's the command to run fasta on 16 CPUs.

 $ fastajob mpfasta MedicagoTruncatula_f.txt MedicagoTruncatula_f.txt -CPUS 16


The general syntax is:

 $ fastajob <type of file> <query file> <database> -CPUS <# of CPUs>


More extensive documentation for this is available at: http://www.indiana.edu/~rac/bioinformatics/fasta.html.


Running my fasta_output parser f2b.pl

The SP's usage policy restricts interactive jobs to less than 20 mins. So if you have a huge fasta-result file to parse out (using my f2b.pl or otherwise), you'd be better off submitting the job through the load leveller (LL) job submission tool. The LL needs you to write short shell scripts specifying various parameters like output filename, class of the job, etc.

Here's a sample script which submits a class 'a' job to execute f2b.pl. Note how the output and error filenames are specified and also how they are linked to the job using $(Cluster)..

 #@ class = a
 #@ initialdir = /gpfs/agopu/
 #@ output = sample_llscript.$(Cluster).out
 #@ error = sample_llscript.$(Cluster).err
 #@ queue

 echo 'Starting f2b_ing..'

 /usr/bin/perl /gpfs/agopu/f2b.pl -i /gpfs/agopu/myfile.fastajob_output -c 400 -f \
   -o /gpfs/agopu/myfile.fasta_all_all.bagin

 echo 'DONE ...'


Copy and paste the above script using your favorite editor and name it (say) ``sample_llscript.sh''. Then, on the SP command prompt, just do a llsubmit on the script file:

 ag@aries02 agopu % llsubmit sample_llscript.sh
 llsubmit: The job "aries02.ucs.indiana.edu.10820" has been submitted.


Once a job is submitted, you can check for its status using 'llq' (grepping for the job id or your username)

 ag@aries02 agopu % llq | grep 10820 
 aries02.10820.0          agopu       4/5  12:27 C  50  a

 ag@aries02 agopu % llq | grep agopu
 aries02.10819.0          agopu       4/5  12:22 C  50  a
 aries02.10820.0          agopu       4/5  12:27 C  50  a


The above test run should create a `myfile.fasta_all_all.bagin' in /gpfs/agopu/. It should also create an output file (and an error file if there were errors down the line) of the form `sample_llscript.<job_id<.out' (Example: sample_llscript.10820.out).


For more details about LL scripts, jobs statuses and such, please look at the SP section of the RATS manual webpages; it has excellent documentation on a lot of associated issues. For information on various computing resources available to IU scientists and students (with faculty sponsorship in certain cases), visit the RAC home page and the supercomputers page.


Arvind Gopu (agopu [at] cs [dot] indiana [dot] edu)
Last Modified: Mon Apr 5 12:09:57 EST 2004