algorithm for checking addresses for matches?

Posted by user151841 on Stack Overflow See other posts from Stack Overflow or by user151841
Published on 2010-05-20T16:23:41Z Indexed on 2010/05/20 16:30 UTC
Read the original article Hit count: 287

Filed under:
|

I'm working on a survey program where people will be given promotional considerations the first time they fill out a survey. In a lot of scenarios, the only way we can stop people from cheating the system and getting a promotion they don't deserve is to check street address strings against each other.

I was looking at using levenshtein distance to give me a number to measure similarity, and consider those below a certain threshold a duplicate.

However, if someone were looking to game the system, they could easily write "S 5th St" instead of "South Fifth Street", and levenshtein would consider those strings to be very different. So then I was thinking to convert all strings to a 'standard address form' i.e. 'South' becomes 's', 'Fifth' becomes '5th', etc.

Then I was thinking this is hopeless, and too much effort to get it working robustly. Is it?

I'm working with PHP/MySql, so I have the limitations inherent in that system.

© Stack Overflow or respective owner

Related posts about algorithm

Related posts about user-data