How to setup Lucene search for a B2B web app?
Posted
by Bill Paetzke
on Stack Overflow
See other posts from Stack Overflow
or by Bill Paetzke
Published on 2010-04-25T03:34:23Z
Indexed on
2010/04/25
3:43 UTC
Read the original article
Hit count: 357
Given:
- 5000 databases (spread out over 5 servers)
- 1 database per client (so you can infer there are 1000 clients)
- 2 to 2000 users per client (let's say avg is 100 users per client)
- Clients (databases) come and go every day (let's assume most remain for at least one year)
- Let's stay agnostic of language or sql brand, since Lucene (and Solr) have a breadth of support
The Question:
How would you setup Lucene search so that each client can only search within its database?
How would you setup the index(es)?
Would you need to add a filter to all search queries?
If a client cancelled, how would you delete their (part of the) index? (this may be trivial--not sure yet)
Possible Solutions:
Make an index for each client (database)
- Pro: Search is faster (than one-index-for-all method). Indices are relative to the size of the client's data.
- Con: I'm not sure what this entails, nor do I know if this is beyond Lucene's scope.
Have a single, gigantic index with a database_name field. Always include database_name as a filter.
- Pro: Not sure. Maybe good for tech support or billing dept to search all databases for info.
- Con: Search is slower (than index-per-client method). Flawed security if query filter removed.
For Example:
Joel Spolsky said in Podcast #11 that his hosted web app product, FogBugz On-Demand, uses Lucene. He has thousands of on-demand clients. And each client gets their own database.
His situation is quite similar to mine. Although, he didn't elaborate on the setup (particularly indices); hence, the need for this question.
One last thing:
I would also accept an answer that uses Solr (the extension of Lucene). Perhaps it's better suited for this problem. Not sure.
© Stack Overflow or respective owner