Data clean up: are there libraries of common permutations that we can use? Or is there a better appr
Posted
by anyaelena
on Stack Overflow
See other posts from Stack Overflow
or by anyaelena
Published on 2010-03-17T04:54:43Z
Indexed on
2010/03/17
5:01 UTC
Read the original article
Hit count: 217
We are working on clean-up and analysis of a lot of human-entered customer data. We need to decide programmatically whether 2 addresses (for example) are the same, even though the data was entered with slight variations.
Right now we run each address through fairly simplistic string replacement (replacing avenue with ave, for example), concatenate the fields and compare the results. We are doing something similar with names.
At the very least, it seems like our list of search-replace values should already exist somewhere.
Or perhaps you can suggest a totally different and superior way to detect matches?
© Stack Overflow or respective owner