Hbase schema design -- to make sorting easy?
Posted
by chen
on Stack Overflow
See other posts from Stack Overflow
or by chen
Published on 2010-03-25T15:28:20Z
Indexed on
2010/06/13
20:32 UTC
Read the original article
Hit count: 270
I have 1M words in my dictionary. Whenever a user issue a query on my website, I will see if the query contains the words in my dictionary and increment the counter corresponding to them individually. Here is the example, say if a user type in "Obama is a president" and "Obama" and "president" are in my dictionary, then I should increment the counter by 1 for "Obama" and "president".
And from time to time, I want to see the top 100 words (most queried words). If I use Hbase to store the counter, what schema should I use? -- I have not come up an efficient one yet.
If I use word in my dictionary as row key, and "counter" as column key, then updating counter(increment) is very efficient. But it's very hard to sort and return the top 100.
Anyone can give a good advice? Thanks.
© Stack Overflow or respective owner