Is it possible to use a back reference to specify the number of replications in a regular expression
Posted
by user307894
on Stack Overflow
See other posts from Stack Overflow
or by user307894
Published on 2010-04-02T21:57:36Z
Indexed on
2010/04/02
22:03 UTC
Read the original article
Hit count: 165
Is it possible to use a back reference to specify the number of replications in a regular expression?
foo= 'ADCKAL+2AG.+2AG.+2AG.+2AGGG+.+G+3AGGa4.'
The substrings that start with '+[0-9]' followed by '[A-z]{n}.' need to be replaced with simply '+' where the variable n is the digit from earlier in the substring. Can that n be back referenced? For example (doesn't work) '+([0-9])[A-z]{/1}.' is the pattern I want replaced with "+" (that last dot can be any character and represents a quality score) so that foo should come out to ADCKAL++++G.G+.
foo = 'ADCKAL+2AG.+2AG.+2AG.+2AGGG^+.+G+3AGGa4.'
indelpatt = re.compile('\+([0-9])')
while indelpatt.search(foo):
indelsize=int(indelpatt.search(foo).group(1))
new_regex = '\+%s[ACGTNacgtn]{%s}.' % (indelsize,indelsize)
newpatt=re.compile(new_regex)
foo = newpatt.sub("+", foo)
I'm probably missing an easier way to parse the string.
© Stack Overflow or respective owner