Tracking/Counting Word Frequency
- by Joel Martinez
I'd like to get some community consensus on a good design to be able to store and query word frequency counts. I'm building an application in which I have to parse text inputs and store how many times a word has appeared (over time). So given the following inputs:
"To Kill a Mocking Bird"
"Mocking a piano player"
Would store the following values:
Word Count
-------------
To 1
Kill 1
A 2
Mocking 2
Bird 1
Piano 1
Player 1
And later be able to quickly query for the count value of a given arbitrary word.
My current plan is to simply store the words and counts in a database, and rely on caching word count values ... But I suspect that I won't get enough cache hits to make this a viable solution long term.
Can anyone suggest algorithms, or data structures, or any other idea that might make this a well-performing solution?