Methods for content uniqueness calculation on C#
        Posted  
        
            by sashaeve
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by sashaeve
        
        
        
        Published on 2010-03-29T19:42:15Z
        Indexed on 
            2010/03/29
            19:43 UTC
        
        
        Read the original article
        Hit count: 438
        
.NET
|content-analysis
I have a set of text files. I want to calculate a content uniqueness for different subsets.
E.g. we have 10 documents (A1 - A10) and want to calculate the uniqueness for subset of documents A1 and A2. So the result must be some value from 0 to 1 (1 - absolutely unique content, 0 - absolutely dublicated content).
What methods for content uniqueness calculation do you know? Please suggest these methods with .NET implementations.
Thanks.
© Stack Overflow or respective owner