Is it possible to use a back reference to specify the number of replications in a regular expression

Posted by user307894 on Stack Overflow See other posts from Stack Overflow or by user307894
Published on 2010-04-02T21:57:36Z Indexed on 2010/04/02 22:03 UTC
Read the original article Hit count: 165

Filed under:
|

Is it possible to use a back reference to specify the number of replications in a regular expression?

foo= 'ADCKAL+2AG.+2AG.+2AG.+2AGGG+.+G+3AGGa4.'

The substrings that start with '+[0-9]' followed by '[A-z]{n}.' need to be replaced with simply '+' where the variable n is the digit from earlier in the substring. Can that n be back referenced? For example (doesn't work) '+([0-9])[A-z]{/1}.' is the pattern I want replaced with "+" (that last dot can be any character and represents a quality score) so that foo should come out to ADCKAL++++G.G+.

foo = 'ADCKAL+2AG.+2AG.+2AG.+2AGGG^+.+G+3AGGa4.'
indelpatt = re.compile('\+([0-9])')
while indelpatt.search(foo):
    indelsize=int(indelpatt.search(foo).group(1))
    new_regex = '\+%s[ACGTNacgtn]{%s}.' % (indelsize,indelsize)
    newpatt=re.compile(new_regex)
    foo = newpatt.sub("+", foo)

I'm probably missing an easier way to parse the string.

© Stack Overflow or respective owner

Related posts about python

Related posts about regex