How can I parse free text (Twitter tweets) against a large database of values?
- by user136416
Hi there
Suppose I have a database containing 500,000 records, each representing, say, an animal. What would be the best approach for parsing 140 character tweets to identify matching records by animal name? For instance, in this string...
"I went down to the woods to day and couldn't believe my eyes: I saw a bear having a picnic with a squirrel."
... I would like to flag up the words "bear" and "squirrel", as they appear in my database.
This strikes me as a problem that has probably been solved many times, but from where I'm sitting it looks prohibitively intensive - iterating over every db record checking for a match in the string is surely a crazy way to do it.
Can anyone with a comp sci degree put me out of my misery? I'm working in C# if that makes any difference. Cheers!