Can you recommend a full-text search engine? (Preferably open source)
I have a database of many (though relatively short) HTML documents. I want users to be able to search this database by entering one or more search words in my C++ desktop application. Hence, I’m looking for a fast full-text search solution to integrate with my app. Ideally, it should:
Skip common words, such as the, of, and, etc.
Support stemming, i.e. search for run also finds documents containing runner, running and ran.
Be able to update its index in the background as new documents are added to the database.
Be able to provide search word suggestions (like Google Suggest)
Have a well-documented API
To illustrate, assume the database has just two documents:
Document 1: This is a test of text search.
Document 2: Testing is fun.
The following words should be in the index: fun, search, test, testing, text. If the user types t in the search box, I want the application to be able to suggest test, testing and text (Ideally, the application should be able to query the search engine for the 10 most common search words starting with t). A search for testing should return both documents.
Other points:
I don't need multi-user support
I don't need support for complex queries
The database resides on the user's computer, so the indexing should be performed locally.
Can you suggest a C or C++ based solution? (I’ve briefly reviewed CLucene and Xapian, but I’m not sure if either will address my needs, especially querying the search word indexes for the suggest feature).