Search Results

Search found 631 results on 26 pages for 'couchdb lucene'.

Page 5/26 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Lucene stop words not removed during searching need a substitute for AnalyzingQueryParser

    - by iamrohitbanga
    I have created a Lucene index with the following analyzer. public class DocSpecAnalyzer extends Analyzer { private static CharArraySet stopSet;// = new HashSet<String>(Arrays.asList());//STOP_WORDS_SET; static { stopSet = new CharArraySet(FDConstants.stopwords, true); // uncommenting this displays all the stop words // for (String s: FDConstants.stopwords) { // System.out.println(s); // } } /** * Specifies whether deprecated acronyms should be replaced with HOST type. * See {@linkplain https://issues.apache.org/jira/browse/LUCENE-1068} */ private final boolean enableStopPositionIncrements; private final Version matchVersion; public DocSpecAnalyzer(Version matchVersion) { this.matchVersion = matchVersion; enableStopPositionIncrements = StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion); } public TokenStream tokenStream(String fieldName, Reader reader) { StandardTokenizer tokenStream = new StandardTokenizer(matchVersion, reader); tokenStream.setMaxTokenLength(DEFAULT_MAX_TOKEN_LENGTH); TokenStream result = new StandardFilter(tokenStream); result = new LowerCaseFilter(result); result = new StopFilter(enableStopPositionIncrements, result, stopSet); result = new PorterStemFilter(result); return result; } /** Default maximum allowed token length */ public static final int DEFAULT_MAX_TOKEN_LENGTH = 255; } Now when I search for documents for a query containing stop words, i get hits for stop words also. It is because of http://lucene.apache.org/java/2_9_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html not handling stop words. Is there a substitute? Update: forgot to mention that I need to do a fuzzy search. that is why i am using an AnalyzingQueryParser. Update portion of code that invokes AnalyzingQueryParser AnalyzingQueryParser parser = new AnalyzingQueryParser(Version.LUCENE_CURRENT,"description", analyzer); // fuzzy matching preparation String fuzzyStr = TextQuery.prepareFuzzy(tq.text, fuzzyDist); Query query = parser.parse(fuzzyStr); TopScoreDocCollector collector = TopScoreDocCollector.create(numHits, true); searcher.search(query, collector);

    Read the article

  • Lucene stop words not removed during searching

    - by iamrohitbanga
    I have created a Lucene index with the following analyzer. public class DocSpecAnalyzer extends Analyzer { private static CharArraySet stopSet;// = new HashSet<String>(Arrays.asList());//STOP_WORDS_SET; static { stopSet = new CharArraySet(FDConstants.stopwords, true); // uncommenting this displays all the stop words // for (String s: FDConstants.stopwords) { // System.out.println(s); // } } /** * Specifies whether deprecated acronyms should be replaced with HOST type. * See {@linkplain https://issues.apache.org/jira/browse/LUCENE-1068} */ private final boolean enableStopPositionIncrements; private final Version matchVersion; public DocSpecAnalyzer(Version matchVersion) { this.matchVersion = matchVersion; enableStopPositionIncrements = StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion); } public TokenStream tokenStream(String fieldName, Reader reader) { StandardTokenizer tokenStream = new StandardTokenizer(matchVersion, reader); tokenStream.setMaxTokenLength(DEFAULT_MAX_TOKEN_LENGTH); TokenStream result = new StandardFilter(tokenStream); result = new LowerCaseFilter(result); result = new StopFilter(enableStopPositionIncrements, result, stopSet); result = new PorterStemFilter(result); return result; } /** Default maximum allowed token length */ public static final int DEFAULT_MAX_TOKEN_LENGTH = 255; } Now when I search for documents for a query containing stop words, i get hits for stop words also. As I post this problem, I found the bug. It is because of http://lucene.apache.org/java/2_9_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html not handling stop words. Is there a substitute? Update: forgot to mention that I need to do a fuzzy search. that is why i am using an AnalyzingQueryParser.

    Read the article

  • Are Vala and desktopcouch ready?

    - by pavolzetor
    Hi, I have started writting rss reader in Vala, but I don't know, what database system should I use, I cannot connect to couchdb and sqlite works fine, but I would like use couchdb because of ubuntu one. I have natty with latest updates public CouchDB.Session session; public CouchDB.Database db; public string feed_table = "feed"; public string item_table = "item"; public struct field { string name; string val; } // constructor public Database() { try { this.session = new CouchDB.Session(); } catch (Error e) { stderr.printf ("%s a\n", e.message); } try { this.db = new CouchDB.Database (this.session, "test"); } catch (Error e) { stderr.printf ("%s a\n", e.message); } try { this.session.get_database_info("test"); } catch (Error e) { stderr.printf ("%s aa\n", e.message); } try { var newdoc = new CouchDB.Document (); newdoc.set_boolean_field ("awesome", true); newdoc.set_string_field ("phone", "555-VALA"); newdoc.set_double_field ("pi", 3.14159); newdoc.set_int_field ("meaning_of_life", 42); this.db.put_document (newdoc); // store document } catch (Error e) { stderr.printf ("%s aaa\n", e.message); } reports $ ./xml_parser rss.xmlCannot connect to destination (127.0.0.1) aa Cannot connect to destination (127.0.0.1) aaa

    Read the article

  • string matching algorithms used by lucene

    - by iamrohitbanga
    i want to know the string matching algorithms used by Apache Lucene. i have been going through the index file format used by lucene given here. it seems that lucene stores all words occurring in the text as is with their frequency of occurrence in each document. but as far as i know that for efficient string matching it would need to preprocess the words occurring in the Documents. example: search for "iamrohitbanga is a user of stackoverflow" (use fuzzy matching) in some documents. it is possible that there is a document containing the string "rohit banga" to find that the substrings rohit and banga are present in the search string, it would use some efficient substring matching. i want to know which algorithm it is. also if it does some preprocessing which function call in the java api triggers it.

    Read the article

  • Intersecting boundaries with lucene

    - by Silvio Donnini
    I'm using Lucene, and I'm trying to find a way to index and retrieve documents that have a ranged property. For example I have: Document 1: Price:[30 TO 50] Document 2: Price:[45 TO 60] Document 3: Price:[60 TO 70] And I would like to search for all the documents whose ranges intersect a specific interval, in the above example, if I search for Price in [55 TO 65] I should get Document 2 and Document 3 as results. I don't think NumericRangeQueries alone would do the trick, I need to work on the index with something similar to R-trees, but are they implemented in Lucene? Also, I suppose that what I need should be a subclass of MultiTermQuery, because the query Price in [55 TO 65] has two boundaries, but I don't see anything suitable among MultiTermQuery's subclasses. Any help is appreciated, thanks, Silvio P.S. I'm using Lucene 2.9.0, but I can update to the latest release if needed.

    Read the article

  • Rollback in lucene

    - by Petrick Lim
    Is there a rollback in lucene? I'm saving & updating database repository & lucene repository simultaneously so that the lucene index & database are in sync.. ex. CustomerRepository.add(customer); SupplierRepository.add(supplier); CustomerLuceneRepository.add(customer); SupplierLuceneRepository.add(supplier); // If this here fails i cannot rollback the customer above DataContext.SubmitChanges();

    Read the article

  • Why are my Lucene Document results empty?

    - by vegashacker
    I'm running a simple test--trying to index something and then search for it. I index a simple document, but then when a search for a string in it, I get back what looks to be an empty document (it has no fields). Lucene seems to be doing something, because if I search for a word that's not in the document, it returns 0 results. Any reason why Lucene would reliably return a document when it finds one that matches the given query, and yet that document has nothing in it? Thanks! PS: I'm actually running Lucandra (Lucene + Cassandra). That certainly may be a relevant detail, but not sure.

    Read the article

  • Sorting CouchDB Views By Value

    - by Lee Theobald
    Hi all, I'm testing out CouchDB to see how it could handle logging some search results. What I'd like to do is produce a view where I can produce the top queries from the results. At the moment I have something like this: Example document portion { "query": "+dangerous +dogs", "hits": "123" } Map function (Not exactly what I need/want but it's good enough for testing) function(doc) { if (doc.query) { var split = doc.query.split(" "); for (var i in split) { emit(split[i], 1); } } } Reduce Function function (key, values, rereduce) { return sum(values); } Now this will get me results in a format where a query term is the key and the count for that term on the right, which is great. But I'd like it ordered by the value, not the key. From the sounds of it, this is not yet possible with CouchDB. So does anyone have any ideas of how I can get a view where I have an ordered version of the query terms & their related counts? I'm very new to CouchDB and I just can't think of how I'd write the functions needed.

    Read the article

  • NLP - Queries using semantic wildcards in full text searching, maybe with Lucene?

    - by Zsolt
    Let's say I have a big corpus (for example in english or an arbitrary language), and I want to perform some semantic search on it. For example I have the query: "Be careful: [art] armada of [sg] is coming to [do sg]!" And the corpus contains the following sentence: "Be careful: an armada of alien ships is coming to destroy our planet!" It can be seen that my query string could contain "semantic placeholders", such as: [art] - some placeholder for articles (for example a / an in English) [sg], [do sg] - some placeholders for NPs and VPs (subjects and predicates) I would like to develop a library which would be capable to handle these queries efficiently. I suspect that some kind of POS-tagging would be necessary for parsing the text, but because I don't want to fully reimplement an already existing full-text search engine to make it work, I'm considering that how could I integrate this behaviour into a search engine like Lucene? I know there are SpanQueries which could behave similarly in some cases, but as I can see, Lucene doesn't do any semantic stuff with stored texts. It is possible to implement a behavior like this? Or do I have to write an own search engine?

    Read the article

  • How to make sure Solr/Lucene won't die with java.lang.OutOfMemoryError?

    - by taw
    I'm really puzzled why it keeps dying with java.lang.OutOfMemoryError during indexing even though it has a few GBs of memory. Is there a fundamental reason why it needs manual tweaking of config files / jvm parameters instead of it just figuring out how much memory is available and limiting itself to that? No other programs except Solr ever have this kind of problem. Yes, I can keep tweaking JVM heap size every time such crashes happen, but this is all so backwards. Here's stack trace of the latest such crash in case it is relevant: SEVERE: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.<init>(String.java:216) at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122) at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:169) at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:701) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:676) at org.apache.lucene.search.FieldComparator$StringOrdValComparator.setNextReader(FieldComparator.java:667) at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:245) at org.apache.lucene.search.Searcher.search(Searcher.java:171) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619)

    Read the article

  • How do i get a document index so i can delete with lucene?

    - by acidzombie24
    Basically i am doing this I think i'll set the document id as the thread id on my site (even if some types of thread wont be searched). So i can search by thread id but i am clue less of how to delete. I found pages that say use the document index and i need to optimize or close before changes take effect but i dont know how to get the document index. How do i? Also i seen one that said to use IndexWriter to delete but i couldnt figure out how to do it with that either.

    Read the article

  • Lucene Fuzzy Match on Phrase instead of Single Word

    - by Koobz
    I'm trying to do a fuzzy match on the Phrase "Grand Prarie" (deliberately misspelled) using Apache Lucene. Part of my issue is that the ~ operator only does fuzzy matches on single word terms and behaves as a proximity match for phrases. Is there a way to do a fuzzy match on a phrase with lucene?

    Read the article

  • Couple o' quick questions on Apache Lucene

    - by Doug
    -- I don't want to start any religious wars, but a quick google search indicates that Apache Lucene is the preferred open source tool for indexing and searching. Are there others? -- What file format does Lucene use to store its index file(s)? Thank is advance. Doug

    Read the article

  • Lucene.NET performance

    - by Paul Knopf
    I have a website that runs of a third party search provider that is expensive. I am going to roll my own. Is Lucene.NET capable of ~25,000 products (or documents), each with maybe ten attributes used for filtering? I am looking to do a "narrow/drill down" or "faceted search". Does that sound like to much to ask from Lucene.NET?

    Read the article

  • lucene index missing files

    - by Akhil
    I have _0.cfs file of a lucene index directory but segments.gen and segments_2 are missing. Can I generate the segments.gen and segments_2 files without having to regenerate the _0.cfs file. Does these "segments" files contain any index specific data, which will thus force me to regnerate the entire index again. Or can I just generate the two "segments" file by copying these from another lucen index directory gnerated with the same lucene version.

    Read the article

  • Resources for getting started with Lucene.Net?

    - by Matt Dotson
    I'm building a simple site that allows users to post text content and I want to add it to a search index as it gets posted, so my site search is up to date. From what I can tell Lucene.NET is a good full text search framework. I've found very few examples of how to use it though. Can anyone post some good references for learning about Lucene?

    Read the article

  • Apache Lucene or another Search in iPhone app

    - by lostInTransit
    Hi I would like to implement a search functionality within my iPhone app which can search for terms within all the documents in the application. I believe I cannot use Apache Lucene directly since it is in Java. Can I use Lucy which is a C port of Lucene (not sure if Perl and Ruby would work on it)? Or is there any other open-source search engine which I can use in my iPhone app for search within the app? Thanks

    Read the article

  • C# Lucene get all the index

    - by ngc224
    Hello, I am working on a windows application using Lucene. I want to get all the indexed keywords and use them as a source for a auto-suggest on search field. How can I receive all the indexed keywords in Lucene? I am fairly new in C#. Code itself is appreciated. Thanks.

    Read the article

  • Solr/Lucene user click based ranking

    - by Danim
    I am facing the problem of sort Lucene results based on user click log. I would like that more accessed results comes first. Does anyone knows how to configure or implement such property in Lucene or Solr? Thank you very much.

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >