Create a dataset: extract features from text documents (TF-IDF)

Posted by BigG on Stack Overflow See other posts from Stack Overflow or by BigG
Published on 2010-05-27T13:27:49Z Indexed on 2010/05/27 13:31 UTC
Read the original article Hit count: 181

I've to create a dataset from some text files, writing them as vectors of features.

Something like this:

doc1: 1,0.45 6,0.001 94,0.1 ...

doc2: 3,0.5 98,0.2 ...

...

each position of the vector represent a word, and the score is given by something like TF-IDF.

Do you know some library/tool/whatever for this? (java is better)

© Stack Overflow or respective owner

Related posts about java

Related posts about tools