def constrainedMatchPair(firstMatch,secondMatch,length):
Posted
by smart
on Stack Overflow
See other posts from Stack Overflow
or by smart
Published on 2010-06-18T05:20:11Z
Indexed on
2010/06/18
5:33 UTC
Read the original article
Hit count: 250
matches of a key string in a target string, where one of the elements of the key string is replaced by a different element. For example, if we want to match ATGC against ATGACATGCACAAGTATGCAT, we know there is an exact match starting at 5 and a second one starting at 15. However, there is another match starting at 0, in which the element A is substituted for C in the key, that is we match ATGC against the target. Similarly, the key ATTA matches this target starting at 0, if we allow a substitution of G for the second T in the key string.
consider the following steps. First, break the key string into two parts (where one of the parts could be an empty string). Let's call them key1 and key2. For each part, use your function from Problem 2 to find the starting points of possible matches, that is, invoke
starts1 = subStringMatchExact(target,key1)
and
starts2 = subStringMatchExact(target,key2)
The result of these two invocations should be two tuples, each indicating the starting points of matches of the two parts (key1 and key2) of the key string in the target. For example, if we consider the key ATGC, we could consider matching A and GC against a target, like ATGACATGCA (in which case we would get as locations of matches for A the tuple (0, 3, 5, 9) and as locations of matches for GC the tuple (7,). Of course, we would want to search over all possible choices of substrings with a missing element: the empty string and TGC; A and GC; AT and C; and ATG and the empty string. Note that we can use your solution for Problem 2 to find these values.
Once we have the locations of starting points for matches of the two substrings, we need to decide which combinations of a match from the first substring and a match of the second substring are correct. There is an easy test for this. Suppose that the index for the starting point of the match of the first substring is n (which would be an element of starts1), and that the length of the first substring is m. Then if k is an element of starts2, denoting the index of the starting point of a match of the second substring, there is a valid match with one substitution starting at n, if n+m+1 = k, since this means that the second substring match starts one element beyond the end of the first substring. finally the question is
Write a function, called constrainedMatchPair which takes three arguments: a tuple representing starting points for the first substring, a tuple representing starting points for the second substring, and the length of the first substring. The function should return a tuple of all members (call it n) of the first tuple for which there is an element in the second tuple (call it k) such that n+m+1 = k, where m is the length of the first substring.
© Stack Overflow or respective owner