How important is index size when searching?
- by Michael K
My company has recently began using Apache Solr to search its data. As we learn how to use it we have gone down the path of indexing multiple fields to get the results we need. Most of these are either N-Grammed or Edge-N-Grammed.
Gramming by nature takes up a lot of space, which takes more time to search. Space is cheap, but time is less so. Index time is not too important, since a delta-import (only get the changes since last index) is extremely quick and you only pay a penalty on the first import. What we've not been able to determine is what effect the index size has on query times. Obviously a larger index takes longer to search, but the time added by n-gramming a field is difficult to predict.
How do you determine whether a field is worth gramming? Can you predict how much longer a query will take when you gram a field?