Hello everyone,
I've been asked to do an evaluation of Solr as an alternative for a commercial search engine.
The application now has a very particular way of sorting results using something called "buckets".
I'll try to explain with a bit of details:
In the interface they have 2 fields: "what" and "where".
Both fields are actually sets of fields (what = category, name, contact info... and where= country, state, region, city...) so the copyfield feature of Solr immediately comes to mind. Now based on the field generated the actual match the result should end up in a specific bucket. In particular the first bucket contains all the result documents that have an exact match on the category field, in the second bucket all exact matches on name, the third partial matches on category, the fourth partial matches on name, the fifth matches on contact info etc... Then within each of those first tier buckets all results are placed in second tier buckets depending on what location was matched: city, then region, then province and so on. To even complicate things more there is also a third tier bucket where results are placed according to the value of a ranking field: all documents with the value 1 in the ranking field go in bucket 1 and so on. And finally results should be randomized in the third tier bucket...
On top of this they obviously want support for facets and paging.
My apologies for the long mail but I would greatly appreciate feedback and/or suggestions.
I'm aware that this that this is a very particular problem but everything that points me in the right direction is helpful.
Cheers,
Tom