What's the best way to match a query to a set of keywords?
- by Ryan Detzel
Pretty much what you would assume Google does. Advertisers come in and big on keywords, lets say "ipod", "ipod nano", "ipod 60GB", "used ipod", etc. Then we have a query, "I want to buy an ipod nano" or "best place to buy used ipods" what kind of algorithms and systems are used to match those queries to the keyword set. I would imagine that some of those keyword sets are huge, 100k keywords made up of one or more actual words. on top of that queries can be 1-n words as well. Any thoughts, links to wikipedia I can start reading?
From what I know already I would use some stemmed hash in disk(CDB?) and a bloom filter to check to see if I should even go to disk.