couchdb lucene - Page 13

CF9's Apache Lucene vs SQL Server's full text search?

- by Henry

ColdFusion 9's full text search is now based on Apache Lucene. We also use SQL Server. Which one's better? Which one's easier? Thanks!

Hibernate/Lucene/HibernateSearch: find all words that start with specific prefix.

I want to get a list of all words in a database table that start with a specific prefix. I've been looking for a way to query the terms in a Lucene index (I need the terms, I don't care about the documents they are from) but without success. Any ideas?

Read the article

In Lucene, using a Standard Analyzer, I want to make fields with spaces and special characters searc

- by Matt

In Lucene, using a Standard Analyzer, I want to make fields with spaces and special characters(underscore,!,@,#,....) searchable. I set IndexField to NOT_ANALYZED_NO_NORMS and Field.Store.YES When I look at my index in LUKE, the fields are as I expected, a value such as: 'SKU Number', yet when I search for 'SKU' or 'SKU*' nothing comes up. What am I missing?

Read the article

Lucene query: bla~* (match words that start with something fuzzy), how?

- by Skipperkongen

In the Lucene query syntax I'd like to combine * and ~ in a valid query similar to: bla~* //invalid query Meaning: Please match words that begin with "bla" or something similar to "bla".

Read the article

How can I search on a list of values using Solr/Lucene?

- by Mike

Given the following query: (field:value1 OR field:value2 OR field:value3 OR ... OR field:value50) Can this be broken down into something less verbose? Basically I have hundreds of category IDs, and I need to search for items under large groups of category IDs (20-50 at a time). In MySQL, I'd just use field IN(value1, value2, value3) rather than (field = value1 OR field = value2 etc...). Is there a simpler way for Solr/Lucene?

Read the article

Does Lucene use fuzzy matching with Custom Analyzer containing stop word filter?

- by iamrohitbanga

I am noticing strange behavior with Lucene. Could you confirm that if I use fuzzy matching with QueryParser and a custom analyzer that applies standardfilter, stopfilter and porterstemfilter?

Read the article

Lucene Search for documents that have a particular field?

- by RP

Lucene.Net - Is there a way to query for documents that contain a particular field. Lets say some of my documents have a field 'foo' and some do not. I want to find all documents that have the field 'foo' - regardless of what the value of foo is. How do I do this? Is it some sort of TermQuery?

Read the article

OutOfMemoryError: Java heap space error when start solr

- by Hamid

Hi I start indexing DB articles with solr, but after add about 58 million article (and about 113 GB size of disk) , i get below error message on tomcat log error Note1: i already set Init memory pool to 256MB, and Max memory pool:1400MB to tomcat server. Note2: I can post or search article but must wait over 3 min for get response. 8-apr-2010 14:27:07 org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.PriorityQueue.initialize(PriorityQueue.java:89) at org.apache.lucene.search.HitQueue.<init>(HitQueue.java:67) at org.apache.lucene.search.TopScoreDocCollector.<init>(TopScoreDocCollector.java:113) at org.apache.lucene.search.TopScoreDocCollector.<init>(TopScoreDocCollector.java:37) at org.apache.lucene.search.TopScoreDocCollector$InOrderTopScoreDocCollector.<init>(TopScoreDocCollector.java:42) at org.apache.lucene.search.TopScoreDocCollector$InOrderTopScoreDocCollector.<init>(TopScoreDocCollector.java:40) at org.apache.lucene.search.TopScoreDocCollector.create(TopScoreDocCollector.java:100) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:979) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:574) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1527) at java.lang.Thread.run(Unknown Source) What's problem ? Have any suggestion ? Thanks in advanced

Read the article

Strange Map Reduce Behavior in CouchDB. Rereduce?

- by Tony

I have a mapreduce issue with couchdb (both functions shown below): when I run it with grouplevel = 2 (exact) I get accurate output: {"rows":[ {"key":["2011-01-11","staff-1"],"value":{"total":895.72,"count":2,"services":6,"services_ignored":6,"services_liked":0,"services_disliked":0,"services_disliked_avg":0,"Revise":{"total":275.72,"count":1},"Review":{"total":620,"count":1}}}, {"key":["2011-01-11","staff-2"],"value":{"total":8461.689999999999,"count":2,"services":41,"services_ignored":37,"services_liked":4,"services_disliked":0,"services_disliked_avg":0,"Revise":{"total":4432.4,"count":1},"Review":{"total":4029.29,"count":1}}}, {"key":["2011-01-11","staff-3"],"value":{"total":2100.72,"count":1,"services":10,"services_ignored":4,"services_liked":3,"services_disliked":3,"services_disliked_avg":2.3333333333333335,"Revise":{"total":2100.72,"count":1}}}, However, changing to grouplevel=1 so the values for all the different staff keys should be all grouped by date no longer gives accurate output (notice the total is currect but all others are wrong): {"rows":[ {"key":["2011-01-11"],"value":{"total":11458.130000000001,"count":2,"services":0,"services_ignored":0,"services_liked":0,"services_disliked":0,"services_disliked_avg":0,"None":{"total":11458.130000000001,"count":2}}}, My only theory is this has something to do with rereduce, which I have not yet learned. Should I explore that option or am I missing something else here? This is the Map function: function(doc) { if(doc.doc_type == 'Feedback') { emit([doc.date.split('T')[0], doc.staff_id], doc); } } And this is the Reduce: function(keys, vals) { // sum all key points by status: total, count, services (liked, rejected, ignored) var ret = { 'total':0, 'count':0, 'services': 0, 'services_ignored': 0, 'services_liked': 0, 'services_disliked': 0, 'services_disliked_avg': 0, }; var total_disliked_score = 0; // handle status function handle_status(doc) { if(!doc.status || doc.status == '' || doc.status == undefined) { status = 'None'; } else if (doc.status == 'Declined') { status = 'Rejected'; } else { status = doc.status; } if(!ret[status]) ret[status] = {'total':0, 'count':0}; ret[status]['total'] += doc.total; ret[status]['count'] += 1; }; // handle likes / dislikes function handle_services(services) { ret.services += services.length; for(var a in services) { if (services[a].user_likes == 10) { ret.services_liked += 1; } else if (services[a].user_likes >= 1) { ret.services_disliked += 1; total_disliked_score += services[a].user_likes; if (total_disliked_score >= ret.services_disliked) { ret.services_disliked_avg = total_disliked_score / ret.services_disliked; } } else { ret.services_ignored += 1; } } } // loop thru docs for(var i in vals) { // increment the total $ ret.total += vals[i].total; ret.count += 1; // update totals and sums for the status of this route handle_status(vals[i]); // do the likes / dislikes stats if(vals[i].groups) { for(var ii in vals[i].groups) { if(vals[i].groups[ii].services) { handle_services(vals[i].groups[ii].services); } } } // handle deleted services if(vals[i].hidden_services) { if (vals[i].hidden_services) { handle_services(vals[i].hidden_services); } } } return ret; }

Read the article

Lucene.NET vs SQL Server Full-text – Generating a million records and a full-text index

In this article we will take a look at how SQL Server performs with one million records in a table. We will create a quick data pumper program to fill up a table with a million dynamically created rows of data. From there we will take a look at querying the data without any special optimizations. Then we will create a Full-text index and see how that helps our searching capability.

Read the article

NoSQL CouchDB Getting Stable with New Release

Popular open source NoSQL database gets a big release as it gears up for 1.0 milestone.

Read the article

CouchDB basics for PHP developers

<b>IBM Developerworks:</b> "If you're a typical PHP developer, it doesn't take a thorough review of past projects to pick out a telling pattern: In most (if not all) cases, you're probably getting PHP to talk to a database back end for all that dynamic data goodness; in 99 percent of those instances, you're using MySQL."

Read the article

Open Source CouchDB Heads to the Cloud

NoSQL database variant gears up with commercial support from cloud vendor and project sponsor Cloudant.

Read the article

Lucene.NET and searching on multiple fields with specific values...

- by Kieron

Hi, I've created an index with various bits of data for each document I've added, each document can differ in it field name. Later on, when I come to search the index I need to query it with exact field/ values - for example: FieldName1 = X AND FieldName2 = Y AND FieldName3 = Z What's the best way of constructing the following using Lucene .NET: What analyser is best to use for this exact match type? Upon retrieving a match, I only need one specific field to be returned (which I add to each document) - should this be the only one stored? Later on I'll need to support keyword searching (so a field can have a list of values and I'll need to do a partial match). The fields and values come from a Dictionary<string, string>. It's not user input, it's constructed from code. Thanks, Kieron

Read the article

How do you boost term relevance in Sql Server Full Text Search like you can in Lucene?

- by Snives

I'm doing a typical full text search using containstable using 'ISABOUT(term1,term2,term3)' and although it supports term weighting that's not what I need. I need the ability to boost the relevancy of terms contained in certain portions of text. For example, it is customary for metatags or page title to be weighted differently than body text when searching web pages. Although I'm not dealing with web pages I do seek the same functionality. In Lucene it's called Document Field Level Boosting. How would one natively do this in Sql Server Full Text Search?

Read the article

Lucene: Fastest way to return the document occurance of a phrase?

- by dont say the kid's name

Hi Guys, I am trying to use Lucene (actually PyLucene!) to find out how many documents contain my exact phrase. My code currently looks like this... but it runs rather slow. Does anyone know a faster way to return document counts? phraseList = ["some phrase 1", "some phrase 2"] #etc, a list of phrases... countsearcher = IndexSearcher(SimpleFSDirectory(File(STORE_DIR)), True) analyzer = StandardAnalyzer(Version.LUCENE_CURRENT) for phrase in phraseList: query = QueryParser(Version.LUCENE_CURRENT, "contents", analyzer).parse("\"" + phrase + "\"") scoreDocs = countsearcher.search(query, 200).scoreDocs print "count is: " + str(len(scoreDocs))

Read the article

How to keep Lucene index synchronized with Mysql database?

- by ?????

I am trying to utilize Lucene to develop full text search in my application, which need to build index based on my mysql database. I was wondering is how to keep these index synchronized with db? I came up with to ways: 1) add extra code in business logic tightly to update the search index . 2) running a separated task to rebuild the index periodically. do you have any other approaches? and what do you think is the best way? Any comments would be appreciate, thanks in advance!

Read the article

Caching Authentication Data

- by PartlyCloudy

Hi, I'm currently implementing a REST web service using CouchDB and RESTlet. The RESTlet layer is mainly for authentication and some minor filtering of the JSON data served by CouchDB: Clients <= HTTP = [ RESTlet <= HTTP = CouchDB ] I'm using CouchDB also to store user login data, because I don't want to add an additional database server for that purpose. Thus, each request to my service causes two CouchDB requests conducted by RESTlet (auth data + "real" request). In order to keep the service as efficent as possible, I want to reduce the number of requests, in this case redundant requests for login data. My idea now is to provide a cache (i.e.LRU-Cache via LinkedHashMap) within my RESTlet application that caches login data, because HTTP caching will probabily not be enough. But how do I invalidate the cache data, once a user changes the password, for instance. Thanks to REST, the application might run on several servers in parallel, and I don't want to create a central instance just to cache login data. Currently, I save requested auth data in the cache and try to auth new requests by using them. If a authentication fails or there is now entry available, I'll dispatch a GET request to my CouchDB storage in order to obtain the actual auth data. So in a worst case, users that have changed their data will perhaps still be able to login with their old credentials. How can I deal with that? Or what is a good strategy to keep the cache(s) up-to-date in general? Thanks in advance.

Read the article

Zend_Search_Lucene vs SOLR

- by spacemonkey

Hi, I have recenlty stumbled into Zend Lucene port of Lucene project. I have a little bit experience with SOLR so I would like to know what is the difference between two of them especially from performance and installation side. As much as I know SOLR requires Tomcat serverlet running in web hosting in order to work, what about Zend Lucene library? I am also a bit confused what means "being implemented on the top of Lucene"?

Read the article

How to structure an index for type ahead for extremely large dataset using Lucene or similar?

- by Pete

I have a dataset of 200million+ records and am looking to build a dedicated backend to power a type ahead solution. Lucene is of interest given its popularity and license type, but I'm open to other open source suggestions as well. I am looking for advice, tales from the trenches, or even better direct instruction on what I will need as far as amount of hardware and structure of software. Requirements: Must have: The ability to do starts with substring matching (I type in 'st' and it should match 'Stephen') The ability to return results very quickly, I'd say 500ms is an upper bound. Nice to have: The ability to feed relevance information into the indexing process, so that, for example, more popular terms would be returned ahead of others and not just alphabetical, aka Google style. In-word substring matching, so for example ('st' would match 'bestseller') Note: This index will purely be used for type ahead, and does not need to serve standard search queries. I am not worried about getting advice on how to set up the front end or AJAX, as long as the index can be queried as a service or directly via Java code. Up votes for any useful information that allows me to get closer to an enterprise level type ahead solution

Read the article

How to search Multiple Sites using Lucene Search engine API?

- by Wael Salman

Hope that someone can help me as soon as possible :-) I would like to know how can we search Multiple Sites using Lucene??! (All sites are in one index). I have succeeded to search one website , and to index multiple sites, however I am not able to search all websites. Consider this method that I have: private void PerformSearch() { DateTime start = DateTime.Now; //Create the Searcher object string strIndexDir = Server.MapPath("index") + @"\" + mstrURL; IndexSearcher objSearcher = new IndexSearcher(strIndexDir); //Parse the query, "text" is the default field to search Query objQuery = QueryParser.Parse(mstrQuery, "text", new StandardAnalyzer()); //Create the result DataTable mobjDTResults.Columns.Add("title", typeof(string)); mobjDTResults.Columns.Add("path", typeof(string)); mobjDTResults.Columns.Add("score", typeof(string)); mobjDTResults.Columns.Add("sample", typeof(string)); mobjDTResults.Columns.Add("explain", typeof(string)); //Perform search and get hit count Hits objHits = objSearcher.Search(objQuery); mintTotal = objHits.Length(); //Create Highlighter QueryHighlightExtractor highlighter = new QueryHighlightExtractor(objQuery, new StandardAnalyzer(), "<B>", "</B>"); //Initialize "Start At" variable mintStartAt = GetStartAt(); //How many items we should show? int intResultsCt = GetSmallerOf(mintTotal, mintMaxResults + mintStartAt); //Loop through results and display for (int intCt = mintStartAt; intCt < intResultsCt; intCt++) { //Get the document from resuls index Document doc = objHits.Doc(intCt); //Get the document's ID and set the cache location string strID = doc.Get("id"); string strLocation = ""; if (mstrURL.Substring(0,3) == "www") strLocation = Server.MapPath("cache") + @"\" + mstrURL + @"\" + strID + ".htm"; else strLocation = doc.Get("path") + doc.Get("filename"); //Load the HTML page from cache string strPlainText; using (StreamReader sr = new StreamReader(strLocation, System.Text.Encoding.Default)) { strPlainText = ParseHTML(sr.ReadToEnd()); } //Add result to results datagrid DataRow row = mobjDTResults.NewRow(); if (mstrURL.Substring(0,3) == "www") row["title"] = doc.Get("title"); else row["title"] = doc.Get("filename"); row["path"] = doc.Get("path"); row["score"] = String.Format("{0:f}", (objHits.Score(intCt) * 100)) + "%"; row["sample"] = highlighter.GetBestFragments(strPlainText, 200, 2, "..."); Explanation objExplain = objSearcher.Explain(objQuery, intCt); row["explain"] = objExplain.ToHtml(); mobjDTResults.Rows.Add(row); } objSearcher.Close(); //Finalize results information mTsDuration = DateTime.Now - start; mintFromItem = mintStartAt + 1; mintToItem = GetSmallerOf(mintStartAt + mintMaxResults, mintTotal); } as you can see that I use the site URL 'mstrURL' when I create the search object string strIndexDir = Server.MapPath("index") + @"\" + mstrURL; How can I do the same when I want to search multiple sites?? Actually I am using the code from http://www.keylimetie.com/blog/2005/8/4/lucenenet/

Read the article

bigtable vs cassandra vs simpledb vs dynamo vs couchdb vs hypertable vs riak vs hbase, what do they

- by Mickey Shine

Sorry if this question is somewhat subjective. I am new to 'could store', 'distributed store' or some concepts like this. I really wonder what do they have in common and want to get an overview on all of them. What do I need to prepare if I want to write a product similar to this?

Read the article

Order by Rand - How can I do in CouchDB?

- by Lucas Renan

I need a random query, but I don't know what's the best way to do this in the view.

Read the article

How order a view result by reduced value? CouchDB

- by Lucas Renan

Can I order the result of a view by the value, returned by reduced? { "rows": [ {"key":"bob","value":2}, {"key":"john","value":3}, {"key":"zztop","value":1} ] } I wanna a result like this: { "rows": [ {"key":"zztop","value":1}, {"key":"bob","value":2}, {"key":"john","value":3} ] }

Read the article

CouchDB query a list function with the results of a view with multiple keys?

- by fuzzy lollipop

Multiple key selection on the view works fine. I now want to process the results through a list function. This is what I am trying right now and I can't get it to work. curl -X POST -d '{"keys":[["cnnid","22222222"],["src","e"]]}' http://localhost:5984/test/_design/transfer/_list/search/search1?group=true&group_level=1

Search Results

Search found 631 results on 26 pages for 'couchdb lucene'.

Page 13/26 | < Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20 | Next Page >

- by Henry

- by Giuseppe

- by Matt

- by Skipperkongen

- by Mike

- by iamrohitbanga

- by RP

- by Hamid

- by Tony

- by Kieron

- by Snives

- by dont say the kid's name

- by ?????

- by PartlyCloudy

- by spacemonkey

- by Pete

- by Wael Salman

- by Mickey Shine

- by Lucas Renan

- by Lucas Renan

- by fuzzy lollipop

< Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20 | Next Page >