Is there a way to create an index on only part of a field in MongoDB, for example on the first 10 characters? I couldn't find it documented (or asked about on here).
The MySQL equivalent would be CREATE INDEX part_of_name ON customer (name(10));.
Reason: I have a collection with a single field that varies in length from a few characters up to over 1000 characters, average 50 characters. As there are a hundred million or so documents it's going to be hard to fit the full index in memory (testing with 8% of the data the index is already 400MB, according to stats). Indexing just the first part of the field would reduce the index size by about 75%. In most cases the search term is quite short, it's not a full-text search.
A work-around would be to add a second field of 10 (lowercased) characters for each item, index that, then add logic to filter the results if the search term is over ten characters (and that extra field is probably needed anyway for case-insensitive searches, unless anybody has a better way). Seems like an ugly way to do it though.
[added later]
I tried adding the second field, containing the first 12 characters from the main field, lowercased. It wasn't a big success.
Previously, the average object size was 50 bytes, but I forgot that includes the _id and other overheads, so my main field length (there was only one) averaged nearer to 30 bytes than 50. Then, the second field index contains the _id and other overheads.
Net result (for my 8% sample) is the index on the main field is 415MB and on the 12 byte field is 330MB - only a 20% saving in space, not worthwhile. I could duplicate the entire field (to work around the case insensitive search problem) but realistically it looks like I should reconsider whether MongoDB is the right tool for the job (or just buy more memory and use twice as much disk space).
[added even later]
This is a typical document, with the source field, and the short lowercased field:
{ "_id" : ObjectId("505d0e89f56588f20f000041"), "q" : "Continental Airlines", "f" : "continental " }
Indexes:
db.test.ensureIndex({q:1});
db.test.ensureIndex({f:1});
The 'f" index, working on a shorter field, is 80% of the size of the "q" index. I didn't mean to imply I included the _id in the index, just that it needs to use that somewhere to show where the index will point to, so it's an overhead that probably helps explain why a shorter key makes so little difference.
Access to the index will be essentially random, no part of it is more likely to be accessed than any other. Total index size for the full file will likely be 5GB, so it's not extreme for that one index. Adding some other fields for other search cases, and their associated indexes, and copies of data for lower case, does start to add up, which I why I started looking into a more concise index.