Lucene Analyzer to Use With Special Characters and Punctuation?
Posted
by Brandon
on Stack Overflow
See other posts from Stack Overflow
or by Brandon
Published on 2010-04-29T02:19:15Z
Indexed on
2010/04/29
2:27 UTC
Read the original article
Hit count: 262
I have a Lucene index that has several documents in it. Each document has multiple fields such as:
Id
Project
Name
Description
The Id field will be a unique identifier such as a GUID, Project is a user's ProjectID and a user can only view documents for their project, and Name and Description contain text that can have special characters.
When a user performs a search on the Name field, I want to be able to attempt to match the best I can such as:
First
Will return both:
First.Last
and
First.Middle.Last
Name can also be something like:
Test (NameTest)
Where, if a user types in 'Test', 'Name', or '(NameTest)', then they can find the result.
However, if I say that Project is 'ProjectA' then that needs to be an exact match (case insensitive search). The same goes with the Id field.
Which fields should I set up as Tokenized and which as Untokenized? Also, is there a good Analyzer I should consider to make this happen?
I am stuck trying to decide the best route to implement the desired searching.
© Stack Overflow or respective owner