Search for string allowing for one mismatches in any location of the string, Python
Posted
by Vincent
on Stack Overflow
See other posts from Stack Overflow
or by Vincent
Published on 2010-03-10T20:42:30Z
Indexed on
2010/03/12
1:17 UTC
Read the original article
Hit count: 447
I am working with DNA sequences of length 25 (see examples below). I have a list of 230,000 and need to look for each sequence in the entire genome (toxoplasma gondii parasite) I am not sure how large the genome is but much more that 230,000 sequences.
I need to look for each of my sequences of 25 characters example(AGCCTCCCATGATTGAACAGATCAT). The genome is formatted as a continuous string ie (CATGGGAGGCTTGCGGAGCCTGAGGGCGGAGCCTGAGGTGGGAGGCTTGCGGAGTGCGGAGCCTGAGCCTGAGGGCGGAGCCTGAGGTGGGAGGCTT.........)
I don't care where or how many times it is found, just yes or no. This is simple I think, str.find(AGCCTCCCATGATTGAACAGATCAT)
But I also what to find a close match defined as wrong(mismatched) at any location but only 1 location and record the location in the sequnce. I am not sure how do do this. The only thing I can think of is using a wildcard and performing the search with a wildcard in each position. ie search 25 times. For example AGCCTCCCATGATTGAACAGATCAT AGCCTCCCATGATAGAACAGATCAT close match with a miss-match at position 13
Speed is not a big issue I am only doing it 3 times. i hope but it would be nice it was fast.
The are programs that do this find matches and partial matches but I am looking for a type of partial match that is not available with these applications.
Here is a similar post for pearl but they are only comparing sequnces not searching a continuous string
© Stack Overflow or respective owner