Sharding / indexing strategy for multi-faceted search

Posted by Graham on Programmers See other posts from Programmers or by Graham
Published on 2013-11-07T11:06:35Z Indexed on 2013/11/07 16:11 UTC
Read the original article Hit count: 271

Filed under:
|
|

I'm currently thinking about our database structure and how we modify it for scale. Specifically, we're thinking about using ElasticSearch to provide our search functionality.

One common pattern with ElasticSearch seems to be the 'user-routing' pattern; that is, using routing to ensure that any one user's data resides on the same shard. This is great for client-specific search e.g. Gmail.

Our application has a constraint such that any user will have a maximum of a few thousand documents, so this pattern seems like a good candidate. However, our search needs to work across all users, as well as targeting a specific user (so I might search my content, Alice's content, or all content). Similarly, we need to provide full-text search across any timeframe; recent months to several years ago.

I'm thinking of combining the 'user-routing' and 'index-per-time-interval' patterns:

  • I create an index for each month
  • By default, searches are aliased against the most recent X months
  • If no results are found, we can search against previous X months
  • As we grow, we can reduce the interval X
  • Each document is routed by the user ID

So, this should let us do the following:

  • search by user. This will search all indeces across 1 shard
  • search by time. This will search ~2 indeces (by default) across all shards

Is this a reasonable approach, considering we may scale to multi-million+ documents? Or should I be denormalizing the data somehow, so that user searches are performed on a totally seperate index from date searches?

Thanks for any pros-cons of the above scenario.

© Programmers or respective owner

Related posts about search

Related posts about indexing