Verify uniqueness of new content
Posted
by
rogerkk
on Programmers
See other posts from Programmers
or by rogerkk
Published on 2012-11-15T09:37:50Z
Indexed on
2012/11/15
11:22 UTC
Read the original article
Hit count: 264
I'm working on a review site, where there is a minor issue with almost duplicate reviews across items. Just a few words are changed. It would be very nice to be able to uncover these duplicates before they are approved by a moderator, and I'm hoping someone could chime in on the best strategy to get there.
The site is running Ruby on Rails on a Postgres database and using Thinking Sphinx for search (all on Heroku), and so far the best option I see is to be pulling all the reviews out of the db and using a module like amatch to compare the strings. Not very efficient, so in this case I guess I'll have to limit the number/age of reviews to scan for dupes.
Anyone got a better idea?
© Programmers or respective owner