How to parse multiple dates from a block of text in Python (or another language)

Posted by mlissner on Stack Overflow See other posts from Stack Overflow or by mlissner
Published on 2011-08-11T15:33:48Z Indexed on 2012/11/09 11:02 UTC
Read the original article Hit count: 184

Filed under:
|
|

I have a string that has several date values in it, and I want to parse them all out. The string is natural language, so the best thing I've found so far is dateutil.

Unfortunately, if a string has multiple date values in it, dateutil throws an error:

>>> s = "I like peas on 2011-04-23, and I also like them on easter and my birthday, the 29th of July, 1928"
>>> parse(s, fuzzy=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/pymodules/python2.7/dateutil/parser.py", line 697, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/usr/lib/pymodules/python2.7/dateutil/parser.py", line 303, in parse
    raise ValueError, "unknown string format"
ValueError: unknown string format

Any thoughts on how to parse all dates from a long string? Ideally, a list would be created, but I can handle that myself if I need to.

I'm using Python, but at this point, other languages are probably OK, if they get the job done.

PS - I guess I could recursively split the input file in the middle and try, try again until it works, but it's a hell of a hack.

© Stack Overflow or respective owner

Related posts about python

Related posts about parsing