Prevent RegEx Hang on Large Matches...

Posted by developerjay on Stack Overflow See other posts from Stack Overflow or by developerjay
Published on 2009-12-18T19:11:03Z Indexed on 2010/06/17 13:13 UTC
Read the original article Hit count: 161

Filed under:
|

This is a great regular expression for dates... However it hangs indefinitely on this one page I tried... I wanted to try this page ( http://pleac.sourceforge.net/pleac%5Fpython/datesandtimes.html ) for the fact that it does have lots of dates on it and I want to grab all of them. I don't understand why it is hanging when it doesn't on other pages... Why is my regexp hanging and/or how could I clean it up to make it better/efficient ?

Python Code:

monthnames = "(?:Jan\w*|Feb\w*|Mar\w*|Apr\w*|May|Jun\w?|Jul\w?|Aug\w*|Sep\w*|Oct\w*|Nov(?:ember)?|Dec\w*)"

pattern1 = re.compile(r"(\d{1,4}[\/\\\-]+\d{1,2}[\/\\\-]+\d{2,4})")

pattern4 = re.compile(r"(?:[\d]*[\,\.\ \-]+)*%s(?:[\,\.\ \-]+[\d]+[stndrh]*)+[:\d]*[\ ]?(PM)?(AM)?([\ \-\+\d]{4,7}|[UTCESTGMT\ ]{2,4})*"%monthnames, re.I)

patterns = [pattern4, pattern1]

for pattern in patterns:
    print re.findall(pattern, s)

btw... when i say im trying it against this site.. I'm trying it against the webpage source.

© Stack Overflow or respective owner

Related posts about python

Related posts about regex