Methods for content uniqueness calculation on C#

Posted by sashaeve on Stack Overflow See other posts from Stack Overflow or by sashaeve
Published on 2010-03-29T19:42:15Z Indexed on 2010/03/29 19:43 UTC
Read the original article Hit count: 361

Filed under:
|

I have a set of text files. I want to calculate a content uniqueness for different subsets.

E.g. we have 10 documents (A1 - A10) and want to calculate the uniqueness for subset of documents A1 and A2. So the result must be some value from 0 to 1 (1 - absolutely unique content, 0 - absolutely dublicated content).

What methods for content uniqueness calculation do you know? Please suggest these methods with .NET implementations.

Thanks.

© Stack Overflow or respective owner

Related posts about .NET

Related posts about content-analysis