Sharding / indexing strategy for multi-faceted search
Posted
by
Graham
on Programmers
See other posts from Programmers
or by Graham
Published on 2013-11-07T11:06:35Z
Indexed on
2013/11/07
16:11 UTC
Read the original article
Hit count: 265
I'm currently thinking about our database structure and how we modify it for scale. Specifically, we're thinking about using ElasticSearch to provide our search functionality.
One common pattern with ElasticSearch seems to be the 'user-routing' pattern; that is, using routing to ensure that any one user's data resides on the same shard. This is great for client-specific search e.g. Gmail.
Our application has a constraint such that any user will have a maximum of a few thousand documents, so this pattern seems like a good candidate. However, our search needs to work across all users, as well as targeting a specific user (so I might search my content, Alice's content, or all content). Similarly, we need to provide full-text search across any timeframe; recent months to several years ago.
I'm thinking of combining the 'user-routing' and 'index-per-time-interval' patterns:
- I create an index for each month
- By default, searches are aliased against the most recent X months
- If no results are found, we can search against previous X months
- As we grow, we can reduce the interval X
- Each document is routed by the user ID
So, this should let us do the following:
- search by user. This will search all indeces across 1 shard
- search by time. This will search ~2 indeces (by default) across all shards
Is this a reasonable approach, considering we may scale to multi-million+ documents? Or should I be denormalizing the data somehow, so that user searches are performed on a totally seperate index from date searches?
Thanks for any pros-cons of the above scenario.
© Programmers or respective owner