How to use NGramTokenizerFactory or NGramFilterFactory?

Posted by user572485 on Stack Overflow See other posts from Stack Overflow or by user572485
Published on 2011-01-12T09:51:11Z Indexed on 2011/01/12 9:53 UTC
Read the original article Hit count: 261

Filed under:

lucene

|

solr

|

tokenizer

|

lucene-index

Hi,

Recently, I am studying how to store and index using Solr. I want to do facet.prefix search. With whitespace tokenizer, "Where are you" will be splited into three words and indexed. If I search facet.prefix="where are", no result will be returned.

I google and found NGramFilterFactory can help me. But when I apply this filter factory, I found the result is "w, h, e, ..., wh, ..", which split the sentence by character, not by token word.

I use the parameters maxGramSize and minGramSize, set to 1 and 3. Does the NGramFilterFactory work right? Should I add some other parameters? Is there some other filter factories which can help me?

Thanks!

© Stack Overflow or respective owner

Related posts about lucene

performance comparision between Zend Lucene and Java Lucene

as seen on Stack Overflow - Search for 'Stack Overflow'
Zend Lucene and Java Lucene are built in PHP and java repectively, and PHP language has a higher level than java. Just wondering How big the performance difference among these two, regarding to index building and data searching? Is it much more effective to let java create and rebuild index, and… >>> More
Why wasn't fast-vector-highlighter (lucene-contrib) made an official part of Lucene 3.0 core

as seen on Stack Overflow - Search for 'Stack Overflow'
I've read some Jira entries and they mentioned moving fast-vector-highlighter to core about a year ago but it never made it. Looking at the svn for contrib it seems incomplete. There are no tests for FastVectorHighlighter Documentation is lacking No samples anywhere on apache.org Anyone have… >>> More
pylucene: install error

as seen on Stack Overflow - Search for 'Stack Overflow'
I am trying to install Pylucene (pylucene-3.3-3-src.tar.gz) on my ubuntu linux 11.10. I have python 2.7.2. I was able to compile JCC (I think) because I didnt see any error when I installed it. When I tried to install Pylucene I get the following error. Can someone help? Thanks. ICU not installed /usr/bin/python… >>> More
Solr WordDelimiterFilter + Lucene Highlighter

as seen on Stack Overflow - Search for 'Stack Overflow'
I am trying to get the Highlighter class from Lucene to work properly with tokens coming from Solr's WordDelimiterFilter. It works 90% of the time, but if the matching text contains a ',' such as "1,500" the output is incorrect: Expected: 'test 1,500 this' Observed: 'test 11,500 this' I… >>> More
java AbstractMethodError

as seen on Stack Overflow - Search for 'Stack Overflow'
How to handle this error in lucene: java.lang.AbstractMethodError: org.apache.lucene.store.Directory.listAll()[Ljava/lang/String; at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:568) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) … >>> More

Related posts about solr

Faceted search with Solr on Windows

as seen on ASP.net Weblogs - Search for 'ASP.net Weblogs'
With over 10 million hits a day, funda.nl is probably the largest ASP.NET website which uses Solr on a Windows platform. While all our data (i.e. real estate properties) is stored in SQL Server, we're using Solr 1.4.1 to return the faceted search results as fast as we can.And yes, Solr is very… >>> More
Severe errors in solr configuration: Error loading class 'solr.TrieDateField'

as seen on Server Fault - Search for 'Server Fault'
Installed Solr on my Windows XP PC. Tomcat seems to be working fine. Cannot get Solr to work. I noticed the TrieDateField is declared in a file called schema.xml in the SolrHome directory. Any thoughts? The Url http://localhost:8080/solr/ returns: HTTP Status 500 - Severe errors in solr configuration… >>> More
Solr error; Anybody know what this means?

as seen on Stack Overflow - Search for 'Stack Overflow'
I am installing solr on my VPS (Ubuntu 9.10) via PuTTY. First, I thought about installing Solr with Tomcat, but then after installing tomcat, I changed my mind and went for the Jetty which comes with Solr. Now that I have setup everything on my Server, and try to start the "start.jar" file, I get… >>> More
Solr DataImportHandler configuration

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to get data from mysql database with the help of DataImportHandler so i can create indexes. Now I've configured my Solr instance so that it works on Tomcat (the example admin page), but if I try to change the sorlconfig.xml file i'll get the error message. I'm working with Solr 3.6 So my configuration… >>> More
Java exception when the traffic grow up

as seen on Server Fault - Search for 'Server Fault'
I have an error with java/solr when the traffic grows up. It seems Solr tries to cast a java.lang.Object to a org.apache.solr.common.util.ConcurrentLRUCache$CacheEntry SEVERE: java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Lorg.apache.solr.common.util.ConcurrentLRUCache$CacheEntry; … >>> More