General Address Parser for Freeform Text
- by Daemonic
We have a program that displays map data (think Google Maps, but with much more interactivity and custom layers for our clients).
We allow navigation via a set of combo boxes that prefill certain fields with a bunch of data (ie: Country: Canada, the Province field is filled in. Select Ontario, and a list of Counties/Regions is filled in. Select a county/region, and a city is filled in, etc...).
While this guarantees accurate addresses, it's a pain for the users if they don't know where a street address or a city are located (ie, which county/region is kitchener in?).
So we are looking at trying to do an address parser with a freeform text field.
The user could enter something like this (similar to Google Maps, Bing Maps, etc...):
22 Main St, Kitchener, On
And we could compartmentalize it into sections and do lookups on the data and get to the point they are looking for (or suggest alternatives).
The problem with this is that how do we properly compartmentalize information? How do we break up the sections and find possible matches? I'm guessing we wouldn't be guaranteed that the user would enter data in a format we always expected (obviously). A follow up to this would be how to present the data if we don't find an exact match (or find multiple exact matches... two cities with the same street name in different counties, for example).
We have a ton of data available in the mapping data (mapinfo tab format mostly). So we can do quick scans of street names, cities, states, etc. But I'm not sure about the best way to go about approaching this problem. Sure, using Google Maps would be nice, bue most of our clients are in closed in networks where outside access is not usually allowed and most aren't willing to rely on google maps (since it doesn't contain as much information as they need, such as custom map layers). They could, obviously, go to google and get the proper location then move to our software, but this would time consuming and speed of the process can be quite important.