Create a term-document matrix from files
Posted
by
Joe
on Super User
See other posts from Super User
or by Joe
Published on 2012-12-01T14:45:27Z
Indexed on
2013/10/31
16:01 UTC
Read the original article
Hit count: 187
I have a set of files from example001.txt
to example100.txt
. Each file contains a list of keywords from a superset (the superset is available if we want it).
So example001.txt
might contain
apple
banana
...
otherfruit
I'd like to be able to process these files and produce something akin to a matrix so there is the list of examples*
on the top row, the fruit down the side, and a '1' in a column if the fruit is in the file.
An example might be...
x example1 example2 example3
Apple 1 1 0
Babana 0 1 0
Coconut 0 1 1
Any idea how I might build some sort of command-line magic to put this together? I'm on OSX and happy with perl or python...
© Super User or respective owner