A "smart" (forgiving) date parser?
- by jdmuys
I have to migrate a very large dataset from one system to another. One of the "source" column contains a date but is really a string with no constraint, while the destination system mandates a date in the format yyyy-mm-dd.
Many, but not all, of the source dates are formatted as yyyymmdd. So to coerce them to the expected format, I do (in Perl):
return "$1-$2-$3" if ($val =~ /(\d{4})[-\/]*(\d{2})[-\/]*(\d{2})/);
The problem arises when the source dates moves away from the "generic" yyyymmdd. The goal is to salvage as many dates as possible, before giving up. Example source strings include:
21/3/1998,
March 2004,
2001,
3/4/97
I can try to match as many of the examples I can find with a succession of regular expressions such as the one above.
But is there something smarter to do? Am I not reinventing the wheel? Is there a library somewhere doing something similar? I couldn't find anything relevant googling "forgiving date parser". (any language is OK).