BioPython: extracting sequence IDs from a Blast output file

Posted by Jon on Stack Overflow See other posts from Stack Overflow or by Jon
Published on 2009-11-05T23:47:19Z Indexed on 2010/05/30 20:52 UTC
Read the original article Hit count: 426

Hi,

I have a BLAST output file in XML format. It is 22 query sequences with 50 hits reported from each sequence. And I want to extract all the 50x22 hits. This is the code I currently have, but it only extracts the 50 hits from the first query.

from Bio.Blast import NCBIXM
blast_records = NCBIXML.parse(result_handle)
blast_record = blast_records.next()

save_file = open("/Users/jonbra/Desktop/my_fasta_seq.fasta", 'w')

for alignment in blast_record.alignments:
    for hsp in alignment.hsps:
            save_file.write('>%s\n' % (alignment.title,))
save_file.close()

Somebody have any suggestions as to extract all the hits? I guess I have to use something else than alignments. Hope this was clear. Thanks!

Jon

© Stack Overflow or respective owner

Related posts about python

Related posts about parsing