Data clean up: are there libraries of common permutations that we can use? Or is there a better appr

Posted by anyaelena on Stack Overflow See other posts from Stack Overflow or by anyaelena
Published on 2010-03-17T04:54:43Z Indexed on 2010/03/17 5:01 UTC
Read the original article Hit count: 223

Filed under:
|
|

We are working on clean-up and analysis of a lot of human-entered customer data. We need to decide programmatically whether 2 addresses (for example) are the same, even though the data was entered with slight variations.

Right now we run each address through fairly simplistic string replacement (replacing avenue with ave, for example), concatenate the fields and compare the results. We are doing something similar with names.

At the very least, it seems like our list of search-replace values should already exist somewhere.

Or perhaps you can suggest a totally different and superior way to detect matches?

© Stack Overflow or respective owner

Related posts about natural-language

Related posts about data