Data clean up: are there libraries of common permutations that we can use? Or is there a better appr
- by anyaelena
We are working on clean-up and analysis of a lot of human-entered customer data. We need to decide programmatically whether 2 addresses (for example) are the same, even though the data was entered with slight variations.
Right now we run each address through fairly simplistic string replacement (replacing avenue with ave, for example), concatenate the fields and compare the results. We are doing something similar with names.
At the very least, it seems like our list of search-replace values should already exist somewhere.
Or perhaps you can suggest a totally different and superior way to detect matches?