Full Text Search like Google
Posted
by Eduardo
on Stack Overflow
See other posts from Stack Overflow
or by Eduardo
Published on 2009-12-30T00:35:32Z
Indexed on
2010/05/22
9:50 UTC
Read the original article
Hit count: 257
I would like to implement full-text-search in my off-line (android) application to search the user generated list of notes.
I would like it to behave just like Google (since most people are already used to querying to Google)
My initial requirements are:
- Fast: like Google or as fast as possible, having 100000 documents with 200 hundred words each.
- Searching for two words should only return documents that contain both words (not just one word) (unless the OR operator is used)
- Case insensitive (aka: normalization): If I have the word 'Hello' and I search for 'hello' it should match.
- Diacritical mark insensitive: If I have the word 'así' a search for 'asi' should match. In Spanish, many people, incorrectly, either do not put diacritical marks or fail in correctly putting them.
- Stop word elimination: To not have a huge index meaningless words like 'and', 'the' or 'for' should not be indexed at all.
- Dictionary substitution (aka: stem words): Similar words should be indexed as one. For example, instances of 'hungrily' and 'hungry' should be replaced with 'hunger'.
- Phrase search: If I have the text 'Hello world!' a search of '"world hello"' should not match it but a search of '"hello world"' should match.
- Search all fields (in multifield documents) if no field specified (not just a default field)
- Auto-completion in search results while typing to give popular searches. (just like Google Suggest)
How may I configure a full-text-search engine to behave as much as possible as Google?
(I am mostly interested in Open Source, Java and in particular Lucene)
© Stack Overflow or respective owner