A "smart" (forgiving) date parser?

Posted by jdmuys on Stack Overflow See other posts from Stack Overflow or by jdmuys
Published on 2009-07-09T10:27:03Z Indexed on 2010/03/21 11:41 UTC
Read the original article Hit count: 285

Filed under:
|
|

I have to migrate a very large dataset from one system to another. One of the "source" column contains a date but is really a string with no constraint, while the destination system mandates a date in the format yyyy-mm-dd.

Many, but not all, of the source dates are formatted as yyyymmdd. So to coerce them to the expected format, I do (in Perl):

return "$1-$2-$3" if ($val =~ /(\d{4})[-\/]*(\d{2})[-\/]*(\d{2})/);

The problem arises when the source dates moves away from the "generic" yyyymmdd. The goal is to salvage as many dates as possible, before giving up. Example source strings include:

21/3/1998, March 2004, 2001, 3/4/97

I can try to match as many of the examples I can find with a succession of regular expressions such as the one above.

But is there something smarter to do? Am I not reinventing the wheel? Is there a library somewhere doing something similar? I couldn't find anything relevant googling "forgiving date parser". (any language is OK).

© Stack Overflow or respective owner

Related posts about regex

Related posts about date