Verify uniqueness of new content
- by rogerkk
I'm working on a review site, where there is a minor issue with almost duplicate reviews across items. Just a few words are changed. It would be very nice to be able to uncover these duplicates before they are approved by a moderator, and I'm hoping someone could chime in on the best strategy to get there.
The site is running Ruby on Rails on a Postgres database and using Thinking Sphinx for search (all on Heroku), and so far the best option I see is to be pulling all the reviews out of the db and using a module like amatch to compare the strings. Not very efficient, so in this case I guess I'll have to limit the number/age of reviews to scan for dupes.
Anyone got a better idea?