lucene - Page 7 - Developer IT

Inspecting Lucene.NET index with Luke want to replicate NHibernate.Search view

- by Tim Peel

Hi, I am trying to put together an index using terms, which I specify as a comma separated list. I want to replicate the display in Luke as seen here: http://ayende.com/Blog/archive/2009/05/03/nhibernate-search-again.aspx But my index value just shows as a single field with the comma separate list value. For example: Tags term,anotherterm When I search my index, it will return results if I search with "term" but will not return anything if I search with "anotherterm" I thought the indexing process would break the comma separate list apart into separate values but this does not seem to be the case. Anyone got any ideas? Thanks

Read the article

Why is my Lucene index getting locked?

- by Andrew Bullock

I had an issue with my search not return the results I expect. I tried to run Luke on my index, but it said it was locked and I needed to Force Unlock it (I'm not a Jedi/Sith though) I tried to delete the index folder and run my recreate-indicies application but the folder was locked. Using unlocker I've found that there are about 100 entries of w3wp.exe (same PID, different Handle) with a lock on the index. Whats going on? I'm doing this in my NHibernate configuration: c.SetListener(ListenerType.PostUpdate, new FullTextIndexEventListener()); c.SetListener(ListenerType.PostInsert, new FullTextIndexEventListener()); c.SetListener(ListenerType.PostDelete, new FullTextIndexEventListener()); And here is the only place i query the index: var fullTextSession = NHibernate.Search.Search.CreateFullTextSession(this.unitOfWork.Session); var fullTextQuery = fullTextSession.CreateFullTextQuery(query, typeof (Person)); fullTextQuery.SetMaxResults(100); return fullTextQuery.List<Person>(); Whats going on? What am i doing wrong? Thanks

Read the article

lucene.net get starting and end index of a highlighted fragment in a searched field

- by user339995

"My search returns a highlighted fragment from a field. I want to know that in that field of particular searched document, where does that fragment starts and ends ?" for instance. consider i am searching "highlighted fragment" in above lines (consider the above para as single document). I am setting my fragmenter as : SimpleFragmenter fragmenter = new SimpleFragmenter(30); now the output of GetBestFragment is somewhat like : "returns a highlighted fragment from" Is it possible to get the starting and ending index of this fragment in the text above (say starting is 10 and ending is 45)

Read the article

Lucene.NET faceted search.

- by Paul Knopf

I found a great tutorial on performing a faceted search. http://www.devatwork.nl/articles/lucenenet/faceted-search-and-drill-down-lucenenet/ This article does not explain how to retrieve the narrowed available attributes to filter from (for further drill down). Lets say I am looking for planners that are red. When I perform the faceted search, I want to return all available attributes to filter from that are red. Then when I add a "weekly format" filter, I want the attribute list to get even smaller, containing only filters available for the segmented group.

Read the article

Which can handle a huge surge of queries: SQL Server 2008 Fulltext or Lucene

- by Luke101

I am creating a widget that will be installed on several websites and blogs. The widget will analyse the remote webpage title and content, then it will return relevent articles/links on my website. The amount of traffic we expect will be very huge roughly 500K queries a day and up from there. I need the queries to be returned very quickly, so I need the candidate to be high performance, similar to google adsense. The remote title can be from 5 to 50 words and the description we will use no more then 3000 words. Which of these two do you think can handle the load.

Read the article

Lucene neo4j sort with boolean fields

- by Daniele

I have indexed some documents (nodes of neo4j) with a boolean property which not always is present. Eg. Node1 label : "label A" Node2: label : "label A" (note, same label of node1) special : true The goal is to get Node2 higher than node 1 for query "label A". Here the code: Index<Node> fulltextLucene = graphDb.index().forNodes( "my-index" ); Sort sort = new Sort(new SortField[] {SortField.FIELD_SCORE, new SortField("special", SortField.????, true) }); IndexHits<Node> results = fulltextLucene.query( "label", new QueryContext( "label A").sort(sort)); How can I accomplish that? Thanks

Read the article

Lucene.Net memory consumption and slow search when too many clauses used

- by Umer

I have a DB having text file attributes and text file primary key IDs and indexed around 1 million text files along with their IDs (primary keys in DB). Now, I am searching at two levels. First is straight forward DB search, where i get primary keys as result (roughly 2 or 3 million IDs) Then i make a Boolean query for instance as following +Text:"test*" +(pkID:1 pkID:4 pkID:100 pkID:115 pkID:1041 .... ) and search it in my Index file. The problem is that such query (having 2 million clauses) takes toooooo much time to give result and consumes reallly too much memory.... Is there any optimization solution for this problem ?

Read the article

Solr/Lucene Scorer

- by TFor

We are currently working on a proof-of-concept for a client using Solr and have been able to configure all the features they want except the scoring. Problem is that they want scores that make results fall in buckets: Bucket 1: exact match on category (score = 4) Bucket 2: exact match on name (score = 3) Bucket 3: partial match on category (score = 2) Bucket 4: partial match on name (score = 1) First thing we did was develop a custom similarity class that would return the correct score depending on the field and an exact or partial match. The only problem now is that when a document matches on both the category and name the scores are added together. Example: searching for "restaurant" returns documents in the category restaurant that also have the word restaurant in their name and thus get a score of 5 (4+1) but they should only get 4. I assume for this to work we would need to develop a custom Scorer class but we have no clue on how to incorporate this in Solr. Another option is to create a custom SortField implementation similar to the RandomSortField already present in Solr. Maybe there is even a simpler solution that we don't know about. All suggestions welcome!

Read the article

Lucene DuplicateFilter question

- by chardex

Hi, Why DuplicateFilter doesn't work together with other filters? For example, if a little remake of the test DuplicateFilterTest, then the impression that the filter is not applied to other filters and first trims results: public void testKeepsLastFilter() throws Throwable { DuplicateFilter df = new DuplicateFilter(KEY_FIELD); df.setKeepMode(DuplicateFilter.KM_USE_LAST_OCCURRENCE); Query q = new ConstantScoreQuery(new ChainedFilter(new Filter[]{ new QueryWrapperFilter(tq), // new QueryWrapperFilter(new TermQuery(new Term("text", "out"))), // works right, it is the last document. new QueryWrapperFilter(new TermQuery(new Term("text", "now"))) // why it doesn't work? It is the third document. }, ChainedFilter.AND)); ScoreDoc[] hits = searcher.search(q, df, 1000).scoreDocs; assertTrue("Filtered searching should have found some matches", hits.length > 0); for (int i = 0; i < hits.length; i++) { Document d = searcher.doc(hits[i].doc); String url = d.get(KEY_FIELD); TermDocs td = reader.termDocs(new Term(KEY_FIELD, url)); int lastDoc = 0; while (td.next()) { lastDoc = td.doc(); } assertEquals("Duplicate urls should return last doc", lastDoc, hits[i].doc); } }

Read the article

A little off topic, but can anyone recommend examples where lucene is used on live websites

- by Phelan

I know wikipedia uses it but I am looking for more product based websites. Thanks.

Read the article

Writing Lucene StandardAnalyzer results to text file with OutputStreamWriter

- by user3693192

I'm getting ONLY the last result written to "outputStreamFile.txt". Can't figure out how to revise code so I can get ALL results written to text file. Sample input text: "1st line of text\n" "2nd line of text \n" Results in only 2nd line begin written (and not 1st line) as: "2nd line text\n" private static void analyze(String text) throws IOException { analyzer = new StandardAnalyzer(Version.LUCENE_30); Reader r = new StringReader(text); TokenStream ts = (TokenStream) analyzer.tokenStream("", r); TermAttribute term = ts.addAttribute(TermAttribute.class); File outfile = new File("C:\\Users\\Desktop\\outputStreamFile.txt"); FileOutputStream fileOutputStream = new FileOutputStream(outfile); OutputStreamWriter outputStreamWriter = new OutputStreamWriter(fileOutputStream, "UTF8"); while(ts.incrementToken()) { //System.out.print(term.term() + " "); outputStreamWriter.write(term.term().toString() + "\r\n"); } outputStreamWriter.close(); }

Read the article

Best cross-language analyzer to use with lucene index

- by Halirob

Hello, I'm looking for feedback on which analyzer to use with an index that has documents from multiple languages. Currently I am using the simpleanalyzer, as it seems to handle the broadest amount of languages. Most of the documents to be indexed will be english, but there will be the occasional double-byte language indexed as well. Are there any other suggestions or should I just stick with the simpleanalyzer. Thanks

Read the article

Lucene search on specific field name?

- by Rachel

I have been playing around with an installation of SOLR that indexes some data from my database. I am able to index data and query it back but I was wondering about how field name queries work. For certain fields I am able to specify their name and the search text to have the results return as expected and for other fields, when I specify their name and search text, no results are returned. q=type:book //(this will work) q=type:book AND title:"The Title" //(no results returned) In this example, type is a required field and title is not. For the example where I search by title, I can see the document in the results of the first query having the given title so I know that a document exists that matches this search. Is making a field 'required' the only way to be able to search by field name? [edit] I'm using the default installation and the 'example' folder inside of solr, editing the xml files and using the interface available through start.jar to be able to run, index and query.

Read the article

Reverse search in Hibernate Search

- by Javi

Hello, I'm using Hibernate Search (which uses Lucene) for searching some Data I have indexed in a directory. It works fine but I need to do a reverse search. By reverse search I mean that I have a list of queries stored in my database I need to check which one of these queries match with a Data object each time Data Object is created. I need it to alert the user when a Data Object matches with a Query he has created. So I need to index this single Data Object which has just been created and see which queries of my list has this object as a result. I've seen Lucene MemoryIndex Class to create an index in memory so I can do something like this example for every query in a list (though iterating in a Java list of queries would not be very efficient): //Iterating over my list<Query> MemoryIndex index = new MemoryIndex(); //Add all fields index.addField("myField", "myFieldData", analyzer); ... QueryParser parser = new QueryParser("myField", analyzer); float score = index.search(query); if (score > 0.0f) { System.out.println("it's a match"); } else { System.out.println("no match found"); } The problem here is that this Data Class has several Hibernate Search Annotations @Field,@IndexedEmbedded,... which indicated how fields should be indexed, so when I invoke index() method on the FullTextEntityManager instance it uses this information to index the object in the directory. Is there a similar way to index it in memory using this information? Is there a more efficient way of doing this reverse search? Thanks

Read the article

FastVectorHighlighter.Net returning null on GetBestFragment

- by Midhat

Hi I have a large index, on which Highlighter.Net works fine, but FastVectorHighlighter returns null as a Best Fragment on Some documents. the searcher works fine. It is just the highlighter. The field has been indexed in the same manner for all documents, so I fail to understand Why it highlights some documents but not all. Using Lucene.Net 2.9.2, built from trunk rev942061

Read the article

How to get a Token from a Lucene TokenStream?

- by FarmBoy

I'm trying to use Apache Lucene for tokenizing, and I am baffled at the process to obtain Tokens from a TokenStream. The worst part is that I'm looking at the comments in the JavaDocs that address my question. http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/analysis/TokenStream.html#incrementToken%28%29 Somehow, an AttributeSource is supposed to be used, rather than Tokens. I'm totally at a loss. Can anyone explain how to get token-like information from a TokenStream?

Read the article

How to setup Lucene search for a B2B web app?

- by Bill Paetzke

Given: 5000 databases (spread out over 5 servers) 1 database per client (so you can infer there are 1000 clients) 2 to 2000 users per client (let's say avg is 100 users per client) Clients (databases) come and go every day (let's assume most remain for at least one year) Let's stay agnostic of language or sql brand, since Lucene (and Solr) have a breadth of support The Question: How would you setup Lucene search so that each client can only search within its database? How would you setup the index(es)? Would you need to add a filter to all search queries? If a client cancelled, how would you delete their (part of the) index? (this may be trivial--not sure yet) Possible Solutions: Make an index for each client (database) Pro: Search is faster (than one-index-for-all method). Indices are relative to the size of the client's data. Con: I'm not sure what this entails, nor do I know if this is beyond Lucene's scope. Have a single, gigantic index with a database_name field. Always include database_name as a filter. Pro: Not sure. Maybe good for tech support or billing dept to search all databases for info. Con: Search is slower (than index-per-client method). Flawed security if query filter removed. For Example: Joel Spolsky said in Podcast #11 that his hosted web app product, FogBugz On-Demand, uses Lucene. He has thousands of on-demand clients. And each client gets their own database. His situation is quite similar to mine. Although, he didn't elaborate on the setup (particularly indices); hence, the need for this question. One last thing: I would also accept an answer that uses Solr (the extension of Lucene). Perhaps it's better suited for this problem. Not sure.

Read the article

What is the advantage of Lucene searching and indexing ?

- by Mehdi Amrollahi

I want to know , What is the advantage of Lucene searching and indexing ? Is searching with Lucene as fast as other searching algorithm like Quick Search? What about indexing ? I want to know more about advantage of Lucene rather that others . thanks .

Read the article

How to setup Lucene/Solr for a B2B web app?

- by Bill Paetzke

Given: 1 database per client (business customer) 5000 clients Clients have between 2 to 2000 users (avg is ~100 users/client) 100k to 10 million records per database Users need to search those records often (it's the best way to navigate their data) Possibly relevant info: Several new clients each week (any time during business hours) Multiple web servers and database servers (users can login via any web server) Let's stay agnostic of language or sql brand, since Lucene (and Solr) have a breadth of support For Example: Joel Spolsky said in Podcast #11 that his hosted web app product, FogBugz On-Demand, uses Lucene. He has thousands of on-demand clients. And each client gets their own database. They use an index per client and store it in the client's database. I'm not sure on the details. And I'm not sure if this is a serious mod to Lucene. The Question: How would you setup Lucene search so that each client can only search within its database? How would you setup the index(es)? Where do you store the index(es)? Would you need to add a filter to all search queries? If a client cancelled, how would you delete their (part of the) index? (this may be trivial--not sure yet) Possible Solutions: Make an index for each client (database) Pro: Search is faster (than one-index-for-all method). Indices are relative to the size of the client's data. Con: I'm not sure what this entails, nor do I know if this is beyond Lucene's scope. Have a single, gigantic index with a database_name field. Always include database_name as a filter. Pro: Not sure. Maybe good for tech support or billing dept to search all databases for info. Con: Search is slower (than index-per-client method). Flawed security if query filter removed. One last thing: I would also accept an answer that uses Solr (the extension of Lucene). Perhaps it's better suited for this problem. Not sure.

Read the article

Extending / changing how Zend_Search_Lucene searches

- by Grant Collins

Hi, I am currently using Zend_Search_Lucene to index and search a number of documents currently at around a 1000 or so. What I would like to do is change how the engine scores hits on a document, from the current default. Zend_Search_Lucene scores on the frequency of number of hits within a document, so a document that has 10 matches of the word PHP will score higher than a document with only 3 matches of PHP. What I am trying to do is pass a number of key words and score depending on the hits of those keywords. e.g. I pass 5 key words say,PHP, MySQL, Javascript, HTML and CSS that I search against the index. One document has 3 matches to those key words and one document has all 4 matches, the 4 matches scores the highest. The number of instances of those words in the document do not concern me. Now I've had a quick look at Zend_Search_Lucene_Search_Similarity however I have to confess that I am not sure (or that bright) to know how to use this to achieve what I am after. Is what I want to do possible using Lucene or is there a better solution out there?

Read the article

Lucene best practice

- by Dragos

I am trying to understand how Lucene should be used. From what I have read, creating an IndexReader is costly, so using a Search Manager shoulg be the right choice. However, a SearchManager should be produced by a NRTManager(which, by the way, should replace the IndexWriter for every add or delete operation performed). But in order to have a NRTManager, I should first have an IndexWriter, and here comes my problem. The documentation says: an IndexWriter is thread-safe the constructor of this class takes a Directory object, so it seems creating an instace should be costly(as in the case of an IndexReader) all changes are buffered and flushed periodically(so they seem to encourage using a single instance) but: the changes, although flushed will only be visible after commit or close after finished making updates(add/delete), the instance should be closed I also found this: http://stackoverflow.com/questions/5374419/forgot-to-close-the-lucene-indexwriter-after-adding-documents-to-the-index where it is said that not closing a writer might ruin everything So what am I really supposed to do? Is having a single IndexWriter instance a good idea(make only commit and never close it)? EDIT: What is more, if I use NRTManager, how can I make acommit`? Is it even possible?

Read the article

Pylucene in Python 2.6 + MacOs Snow Leopard

- by jbastos

Greetings, I'm trying to install Pylucene on my 32-bit python running on Snow Leopard. I compiled JCC with success. But I get warnings while making pylucene: ld: warning: in build/temp.macosx-10.6-i386-2.6/build/_lucene/__init__.o, file is not of required architecture ld: warning: in build/temp.macosx-10.6-i386-2.6/build/_lucene/__wrap01__.o, file is not of required architecture ld: warning: in build/temp.macosx-10.6-i386-2.6/build/_lucene/__wrap02__.o, file is not of required architecture ld: warning: in build/temp.macosx-10.6-i386-2.6/build/_lucene/__wrap03__.o, file is not of required architecture ld: warning: in build/temp.macosx-10.6-i386-2.6/build/_lucene/functions.o, file is not of required architecture ld: warning: in build/temp.macosx-10.6-i386-2.6/build/_lucene/JArray.o, file is not of required architecture ld: warning: in build/temp.macosx-10.6-i386-2.6/build/_lucene/JObject.o, file is not of required architecture ld: warning: in build/temp.macosx-10.6-i386-2.6/build/_lucene/lucene.o, file is not of required architecture ld: warning: in build/temp.macosx-10.6-i386-2.6/build/_lucene/types.o, file is not of required architecture ld: warning: in /Developer/SDKs/MacOSX10.4u.sdk/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/JCC-2.3-py2.6-macosx-10.3-fat.egg/libjcc.dylib, file is not of required architecture ld: warning: in /Developer/SDKs/MacOSX10.4u.sdk/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/JCC-2.3-py2.6-macosx-10.3-fat.egg/libjcc.dylib, file is not of required architecture build of complete Then I try to import lucene: MacBookPro:~/tmp/trunk python Python 2.6.3 (r263:75184, Oct 2 2009, 07:56:03) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import pylucene Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: No module named pylucene >>> import lucene Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/lucene-2.9.0-py2.6-macosx-10.6-i386.egg/lucene/__init__.py", line 7, in <module> import _lucene ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/lucene-2.9.0-py2.6-macosx-10.6-i386.egg/lucene/_lucene.so, 2): Symbol not found: __Z8getVMEnvP7_object Referenced from: /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/lucene-2.9.0-py2.6-macosx-10.6-i386.egg/lucene/_lucene.so Expected in: flat namespace in /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/lucene-2.9.0-py2.6-macosx-10.6-i386.egg/lucene/_lucene.so >>> Any hints?

Read the article

Zend_Search_Lucene and range search

- by ranza

I have a bunch of int key fields in my index and trying to do a simple range search like this: `gender:1 AND height:[120 TO 180]` This should give me male in the height range 120 to 180. But for some reason i get this exception: `At least one range query boundary term must be non-empty term` How would i debug this? Is it just Zend_Search_Lucene being buggy?

Read the article

Setting wildcard queries as default for QueryParser

- by user46703

When my users enter a term like "word" I would like it be treated as a wildcard query "word*" so all terms beginning "word" are found. Is there a way to tell the QueryParser to automatically create wildcard queries or do I have to parse the query myself? This shouldn't be a problem for simple queries but it may become tricky for more complex queries.

Read the article

How to use NGramTokenizerFactory or NGramFilterFactory?

- by user572485

Hi, Recently, I am studying how to store and index using Solr. I want to do facet.prefix search. With whitespace tokenizer, "Where are you" will be splited into three words and indexed. If I search facet.prefix="where are", no result will be returned. I google and found NGramFilterFactory can help me. But when I apply this filter factory, I found the result is "w, h, e, ..., wh, ..", which split the sentence by character, not by token word. I use the parameters maxGramSize and minGramSize, set to 1 and 3. Does the NGramFilterFactory work right? Should I add some other parameters? Is there some other filter factories which can help me? Thanks!

Search Results

Search found 393 results on 16 pages for 'lucene'.

Page 7/16 | < Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >

- by Tim Peel

- by Andrew Bullock

- by user339995

- by Paul Knopf

- by Luke101

- by Daniele

- by Umer

- by TFor

- by chardex

- by Phelan

- by user3693192

- by Halirob

- by Rachel

- by Javi

- by Midhat

- by FarmBoy

- by Bill Paetzke

- by Mehdi Amrollahi

- by Bill Paetzke

- by Grant Collins

- by Dragos

- by jbastos

- by ranza

- by user46703

- by user572485

< Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >