Simple implementation of N-Gram, tf-idf and Cosine similarity in Python

Posted by seanieb on Stack Overflow See other posts from Stack Overflow or by seanieb
Published on 2010-03-04T15:22:30Z Indexed on 2010/05/02 17:27 UTC
Read the original article Hit count: 900

Filed under:
|
|
|

I need to compare documents stored in a DB and come up with a similarity score between 0 and 1.

The method I need to use has to be very simple. Implementing a vanilla version of n-grams (where it possible to define how many grams to use), along with a simple implementation of tf-idf and Cosine similarity.

Is there any program that can do this? Or should I start writing this from scratch?

© Stack Overflow or respective owner

Related posts about python

Related posts about document