Hyphens in Lucene

Posted by user72185 on Stack Overflow See other posts from Stack Overflow or by user72185
Published on 2010-04-08T14:56:18Z Indexed on 2010/04/08 15:23 UTC
Read the original article Hit count: 411

Filed under:
|

Hi,

I'm playing around with Lucene and noticed that the use of a hyphen (e.g. "semi-final") will result in two words ("semi" and "final" in the index. How is this supposed to match if the users searches for "semifinal", in one word?

Edit: I'm just playing around with the StandardTokenizer class actually, maybe that is why? Am I missing a filter?

Thanks!

(Edit) My code looks like this:

            StandardAnalyzer sa = new StandardAnalyzer();
            TokenStream ts = sa.TokenStream("field", new StringReader("semi-final"));

            while (ts.IncrementToken())
            {
                string t = ts.ToString();
                Console.WriteLine("Token: " + t);
            }

© Stack Overflow or respective owner

Related posts about lucene.net

Related posts about lucene