Full Text Search like Google
- by Eduardo
I would like to implement full-text-search in my off-line (android) application to search the user generated list of notes.
I would like it to behave just like Google (since most people are already used to querying to Google)
My initial requirements are:
Fast: like Google or as fast as possible, having 100000 documents with 200 hundred words each.
Searching for two words should only return documents that contain both words (not just one word) (unless the OR operator is used)
Case insensitive (aka: normalization): If I have the word 'Hello' and I search for 'hello' it should match.
Diacritical mark insensitive: If I have the word 'así' a search for 'asi' should match. In Spanish, many people, incorrectly, either do not put diacritical marks or fail in correctly putting them.
Stop word elimination: To not have a huge index meaningless words like 'and', 'the' or 'for' should not be indexed at all.
Dictionary substitution (aka: stem words): Similar words should be indexed as one. For example, instances of 'hungrily' and 'hungry' should be replaced with 'hunger'.
Phrase search: If I have the text 'Hello world!' a search of '"world hello"' should not match it but a search of '"hello world"' should match.
Search all fields (in multifield documents) if no field specified (not just a default field)
Auto-completion in search results while typing to give popular searches. (just like Google Suggest)
How may I configure a full-text-search engine to behave as much as possible as Google?
(I am mostly interested in Open Source, Java and in particular Lucene)