Simple implementation of N-Gram, tf-idf and Cosine similarity in Python
Posted
by seanieb
on Stack Overflow
See other posts from Stack Overflow
or by seanieb
Published on 2010-03-04T15:22:30Z
Indexed on
2010/05/02
17:27 UTC
Read the original article
Hit count: 907
I need to compare documents stored in a DB and come up with a similarity score between 0 and 1.
The method I need to use has to be very simple. Implementing a vanilla version of n-grams (where it possible to define how many grams to use), along with a simple implementation of tf-idf and Cosine similarity.
Is there any program that can do this? Or should I start writing this from scratch?
© Stack Overflow or respective owner