Hyphens in Lucene
Posted
by user72185
on Stack Overflow
See other posts from Stack Overflow
or by user72185
Published on 2010-04-08T14:56:18Z
Indexed on
2010/04/08
15:23 UTC
Read the original article
Hit count: 411
lucene.net
|lucene
Hi,
I'm playing around with Lucene and noticed that the use of a hyphen (e.g. "semi-final") will result in two words ("semi" and "final" in the index. How is this supposed to match if the users searches for "semifinal", in one word?
Edit: I'm just playing around with the StandardTokenizer class actually, maybe that is why? Am I missing a filter?
Thanks!
(Edit) My code looks like this:
StandardAnalyzer sa = new StandardAnalyzer();
TokenStream ts = sa.TokenStream("field", new StringReader("semi-final"));
while (ts.IncrementToken())
{
string t = ts.ToString();
Console.WriteLine("Token: " + t);
}
© Stack Overflow or respective owner