Prevent RegEx Hang on Large Matches...
- by developerjay
This is a great regular expression for dates... However it hangs indefinitely on this one page I tried... I wanted to try this page ( http://pleac.sourceforge.net/pleac%5Fpython/datesandtimes.html ) for the fact that it does have lots of dates on it and I want to grab all of them. I don't understand why it is hanging when it doesn't on other pages... Why is my regexp hanging and/or how could I clean it up to make it better/efficient ?
Python Code:
monthnames = "(?:Jan\w*|Feb\w*|Mar\w*|Apr\w*|May|Jun\w?|Jul\w?|Aug\w*|Sep\w*|Oct\w*|Nov(?:ember)?|Dec\w*)"
pattern1 = re.compile(r"(\d{1,4}[\/\\\-]+\d{1,2}[\/\\\-]+\d{2,4})")
pattern4 = re.compile(r"(?:[\d]*[\,\.\ \-]+)*%s(?:[\,\.\ \-]+[\d]+[stndrh]*)+[:\d]*[\ ]?(PM)?(AM)?([\ \-\+\d]{4,7}|[UTCESTGMT\ ]{2,4})*"%monthnames, re.I)
patterns = [pattern4, pattern1]
for pattern in patterns:
print re.findall(pattern, s)
btw... when i say im trying it against this site.. I'm trying it against the webpage source.