How can I parse free text (Twitter tweets) against a large database of values?
Posted
by user136416
on Stack Overflow
See other posts from Stack Overflow
or by user136416
Published on 2010-05-16T12:30:38Z
Indexed on
2010/05/16
12:40 UTC
Read the original article
Hit count: 159
Hi there
Suppose I have a database containing 500,000 records, each representing, say, an animal. What would be the best approach for parsing 140 character tweets to identify matching records by animal name? For instance, in this string...
"I went down to the woods to day and couldn't believe my eyes: I saw a bear having a picnic with a squirrel."
... I would like to flag up the words "bear" and "squirrel", as they appear in my database.
This strikes me as a problem that has probably been solved many times, but from where I'm sitting it looks prohibitively intensive - iterating over every db record checking for a match in the string is surely a crazy way to do it.
Can anyone with a comp sci degree put me out of my misery? I'm working in C# if that makes any difference. Cheers!
© Stack Overflow or respective owner