How to parse multiple dates from a block of text in Python (or another language)
- by mlissner
I have a string that has several date values in it, and I want to parse them all out. The string is natural language, so the best thing I've found so far is dateutil.
Unfortunately, if a string has multiple date values in it, dateutil throws an error:
>>> s = "I like peas on 2011-04-23, and I also like them on easter and my birthday, the 29th of July, 1928"
>>> parse(s, fuzzy=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/pymodules/python2.7/dateutil/parser.py", line 697, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/usr/lib/pymodules/python2.7/dateutil/parser.py", line 303, in parse
raise ValueError, "unknown string format"
ValueError: unknown string format
Any thoughts on how to parse all dates from a long string? Ideally, a list would be created, but I can handle that myself if I need to.
I'm using Python, but at this point, other languages are probably OK, if they get the job done.
PS - I guess I could recursively split the input file in the middle and try, try again until it works, but it's a hell of a hack.