vectorization of a text file
- by Fox
I am trying to implement vectorization of a text file...I have created a dictionary (Unique words in all the documents) ... Which is the best way to implement this in java?
For example -
My dictionary has the following words - {w1, w2, w3, w4}
And I have 2 documents each having subset of the words in the vocabulary. I need to write to a text file the matrix in the form --
1,3,4,0
0,0,2,1
Here each row represents a document and the values represent the occurrence of each word in the document.
Can you suggest me the most efficient way to implement this in Java?