Lucene (.NET) Document stucture and performance suggestions.

Posted by Josh Handel on Stack Overflow See other posts from Stack Overflow or by Josh Handel
Published on 2010-05-15T13:39:42Z Indexed on 2010/05/15 13:44 UTC
Read the original article Hit count: 361

Filed under:

lucene

|

lucene.net

|

search

|

Performance

Hello, I am indexing about 100M documents that consist of a few string identifiers and a hundred or so numaric terms.. I won't be doing range queries, so I haven't dugg too deep into Numaric Field but I'm not thinking its the right choose here.

My problem is that the query performance degrades quickly when I start adding OR criteria to my query.. All my queries are on specific numaric terms.. So a document looks like StringField:[someString] and N DataField:[someNumber].. I then query it with something like DataField:((+1 +(2 3)) (+75 +(3 5 52)) (+99 +88 +(102 155 199))).

Currently these queries take about 7 to 16 seconds to run on my laptop.. I would like to make sure thats really the best they can do.. I am open to suggestions on field structure and query structure :-).

Thanks

Josh

PS: I have already read over all the other lucene performance discussions on here, and on the Lucene wiki and at lucid imiagination... I'm a bit further down the rabbit hole then that...

© Stack Overflow or respective owner

Related posts about lucene

performance comparision between Zend Lucene and Java Lucene

as seen on Stack Overflow - Search for 'Stack Overflow'
Zend Lucene and Java Lucene are built in PHP and java repectively, and PHP language has a higher level than java. Just wondering How big the performance difference among these two, regarding to index building and data searching? Is it much more effective to let java create and rebuild index, and… >>> More
Why wasn't fast-vector-highlighter (lucene-contrib) made an official part of Lucene 3.0 core

as seen on Stack Overflow - Search for 'Stack Overflow'
I've read some Jira entries and they mentioned moving fast-vector-highlighter to core about a year ago but it never made it. Looking at the svn for contrib it seems incomplete. There are no tests for FastVectorHighlighter Documentation is lacking No samples anywhere on apache.org Anyone have… >>> More
pylucene: install error

as seen on Stack Overflow - Search for 'Stack Overflow'
I am trying to install Pylucene (pylucene-3.3-3-src.tar.gz) on my ubuntu linux 11.10. I have python 2.7.2. I was able to compile JCC (I think) because I didnt see any error when I installed it. When I tried to install Pylucene I get the following error. Can someone help? Thanks. ICU not installed /usr/bin/python… >>> More
Solr WordDelimiterFilter + Lucene Highlighter

as seen on Stack Overflow - Search for 'Stack Overflow'
I am trying to get the Highlighter class from Lucene to work properly with tokens coming from Solr's WordDelimiterFilter. It works 90% of the time, but if the matching text contains a ',' such as "1,500" the output is incorrect: Expected: 'test 1,500 this' Observed: 'test 11,500 this' I… >>> More
java AbstractMethodError

as seen on Stack Overflow - Search for 'Stack Overflow'
How to handle this error in lucene: java.lang.AbstractMethodError: org.apache.lucene.store.Directory.listAll()[Ljava/lang/String; at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:568) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) … >>> More

Related posts about lucene.net

Good Lucene .NET alternative for ASP.NET website

as seen on Stack Overflow - Search for 'Stack Overflow'
Are there any good alternatives for Lucene .NET to use in a ASP.NET website? I want to index XML-, TXT-, PDF- and DOC-files. Thanks! >>> More
Where does lucene .net cache the search results?

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I'm trying to figure out where Lucene stores the cached query results, and how it's configured to do so - and how long it caches for. This is for an ASP.NET 3.5 solution. I'm getting this problem: If I run a search and sort the result by a particular product field, it seems to work the very… >>> More
Lucene .Net Searching with TermVector

as seen on Stack Overflow - Search for 'Stack Overflow'
in Lucene.Net,i am creating the document for searching a word and want to display before 10 words and after 10 words.i have used TermVector. Lucene.Net.Documents.Field fldContent = new Lucene.Net.Documents.Field("content", content, Lucene.Net.Documents.Field.Store.YES, Lucene… >>> More
Lucene.Net support phrases?: What is best approach to tokenize comma-delimited data (atomically) in

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a database with a column I wish to index that has comma-delimited names, e.g., User.FullNameList = "Helen Ready, Phil Collins, Brad Paisley" I prefer to tokenize each name atomically (name as a whole searchable entity). What is the best approach for this? Did I miss a simple option to… >>> More
Lucene.NET vs SQL Server Full-text – Generating a million records and a full-text index

as seen on Dot net Slackers - Search for 'Dot net Slackers'
In this article we will take a look at how SQL Server performs with one million records in a table. We will create a quick data pumper program to fill up a table with a million dynamically created rows of data. From there we will take a look at querying the data without any special optimizations… >>> More