BioPython: extracting sequence IDs from a Blast output file
Posted
by Jon
on Stack Overflow
See other posts from Stack Overflow
or by Jon
Published on 2009-11-05T23:47:19Z
Indexed on
2010/05/30
20:52 UTC
Read the original article
Hit count: 426
Hi,
I have a BLAST output file in XML format. It is 22 query sequences with 50 hits reported from each sequence. And I want to extract all the 50x22 hits. This is the code I currently have, but it only extracts the 50 hits from the first query.
from Bio.Blast import NCBIXM
blast_records = NCBIXML.parse(result_handle)
blast_record = blast_records.next()
save_file = open("/Users/jonbra/Desktop/my_fasta_seq.fasta", 'w')
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
save_file.write('>%s\n' % (alignment.title,))
save_file.close()
Somebody have any suggestions as to extract all the hits? I guess I have to use something else than alignments. Hope this was clear. Thanks!
Jon
© Stack Overflow or respective owner