What algorithms can I use to detect if articles or posts are duplicates?
- by michael
I'm trying to detect if an article or forum post is a duplicate entry within the database. I've given this some thought, coming to the conclusion that someone who duplicate content will do so using one of the three (in descending difficult to detect):
simple copy paste the whole text
copy and paste parts of text merging it with their own
copy an…