Search Results

Search found 631 results on 26 pages for 'couchdb lucene'.

Page 10/26 | < Previous Page | 6 7 8 9 10 11 12 13 14 15 16 17 | Next Page >

Map/Reduce on an array of hashes in CouchDB

- by sebastiangeiger

Hello everyone, I am looking for a map/reduce function to calculate the status in a Design Document. Below you can see an example document from my current database. { "_id": "0238f1414f2f95a47266ca43709a6591", "_rev": "22-24a741981b4de71f33cc70c7e5744442", "status": "retrieved image urls", "term": "Lucas Winter", "urls": [ { "status": "retrieved", "url": "http://...." }, { "status": "retrieved", "url": "http://..." } ], "search_depth": 1, "possible_labels": { "gender": "male" }, "couchrest-type": "SearchTerm" } I'd like to get rid of the status key and rather calculate it from the statuses of the urls. My current by_status view looks like the following: function(doc) { if (doc['status']) { emit(doc['status'], null); } } I tried some things but nothing actually works. Right now my Map Function looks like this: function(doc) { if(doc.urls){ emit(doc._id, doc.urls) } } And my Reduce Function function(key, value, rereduce){ var reduced_status = "retrieved" for(var url in value){ if(url.status=="new"){ reduced_status = "new"; } } return reduced_status; } The result is that I get retrieved everywhere which is definitely not right. I tried to narrow down the problem and it seems to be that value is no array, when I use the following Reduce Function I get length 1 everywhere, which is impossible because I have 12 documents in my database, each containing between 20 to 200 urls function(key, value, rereduce){ return value.length; } What am I doing wrong? (I know I want you to write code for me and I'm feeling guilty, but right now I do the calculation of the statuses in ruby after getting the data from the database. It would be nice to already get the right data from the database)

Read the article
CouchDB Map/Reduce raises execption in reduce function?

- by fuzzy lollipop

my view generates keys in this format ["job_id:1234567890", 1271430291000] where the first key element is a unique key and the second is a timestamp in milliseconds. I run my view with this elapsed_time?startkey=["123"]&endkey=["123",{}]&group=true&group_level=1 and here is my reduce function, the intention is to reduce the output to get the earliest and latest timestamps and return the difference between them and now function(keys,values,rereduce) { var now = new Date().valueOf(); var first = Number.MIN_VALUE; var last = Number.MAX_VALUE; if (rereduce) { first = Math.max(first,values[0].first); last = Math.min(last,values[0].last); } else { first = keys[0][0][1]; last = keys[keys.length-1][0][1]; } return {first:now - first, last:now - last}; } and when processing a query it constantly raises the following execption: function raised exception (new TypeError("keys has no properties", "", 1)) I am making sure not to reference keys inside my rereduce block. Why does this function constantly raise this exception?

Read the article
problem with two key ranges in couchdb

- by Duasto

I'm having problem getting the right results in my coordinate system. To explain my system, I have this simple database that have x_axis, y_axis and name columns. I don't need to get all the data, I just need to display some part of it. For example, I have a coordinate system that have 10:10(meaning from x_axis -10 to 10 and from y_axis -10 to 10) and I want to display only 49 coordinates. In sql query I can do it something like this: "select * from coordinate where x_axis = -3 and x_axis <= 3 and y_axis = -3 y_axis <= 3" I tried this function but no success: "by_range": { "map": "function(doc) { emit([doc.x_axis, doc.y_axis], doc) }" } by_range?startkey=[-3,-3]&endkey=[3,3] I got a wrong results of: -3x-3 -3x-2 -3x-1 -3x0 -3x1 -3x2 -3x3 <-- should not display this part -- -3x4 -3x5 -3x6 -3x7 -3x8 -3x9 -3x10 <-- end of should not display this part -- ..... up to 3x3 to give you a better understanding of my project here is the screenshot of that I want to be made: Oops they don't allowed new poster to post an image img96(dot)imageshack(dot)us/img96/5382/coordinates(dot)jpg <<< just change the "(dot)" to "."

Read the article
How to use the lib section of a Design Document in CouchDB 0.11

- by fuzzy lollipop

I want to add some custom javascript functions to my design document, I can't find any examples of how to actually add these functions to the "lib" section of a design document. Can someone show me an example of how to add a function definition to the "lib" section?

Read the article
Best way to use Cradle with Express.js (CouchDB, Node.js)

- by Costa

I'm building my website ( http://tedxgramercy.jit.su ) with express.js and so far I've been using the http.request method in node to access couch, and that's been cool. I've learned lots about how http, couch, and node work, which is awesome. Anyways, I'm thinking of moving over to cradle now (Let me know if you have a strong opinion about this) and I'd like to know the "right" way to set this up. Should I... require() cradle and make a new connection to my db in each separate route? create my database connection once, and then just pass that connection by require()ing the connection in each route? (if so, how do I do that?) Thanks!!!

Read the article
Different analyzers for each field

- by user72185

Hi, How can I enable different analyzers for each field in a document I'm indexing with Lucene? Example: RAMDirectory dir = new RAMDirectory(); IndexWriter iw = new IndexWriter(dir, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_CURRENT), true, IndexWriter.MaxFieldLength.UNLIMITED); Document doc = new Document(); Field field1 = new Field("field1", someText1, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS); Field field2 = new Field("field2", someText2, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS); doc.Add(field1); doc.Add(field2); iw.AddDocument(doc); iw.Commit(); The analyzer is an argument to the IndexWriter, but I want to use StandardAnalyzer for field1 and SimpleAnalyzer for field2, how can I do that? The same applies when searching, of course. The correct analyzer must be applied for each field.

Read the article
Which PHP library I should choose to work with CouchDB?

- by Guss

I want to try playing with CouchDB for a new project I'm writing (as a hobby, not part of my job). I'm well versed in PHP, but I haven't programmed with CouchDB at all, and also I have little experience with non-SQL databases. From looking at CouchDB's "Getting Started with PHP" document they recommend using a third-party library or writing your own client using their RESTful HTTP API. I think I'd rather not mess with writing protocol implementations myself at this point, but what is your experience with writing PHP to work with CouchDB? I haven't tested any of the alternatives yet, but I looked at: PHPillow : I'm interested in the way they implement ORM. I wasn't planning to do ORM, but my problem domain probably map well to that method. PHP Object Freezer: seems like a poor man's ORM - I can use it to implement an actual ORM, or just as an easy store/retrieve document API but it seems too primitive. PHP-on-Couch : Also a bit simple, but they have an interesting API for views and from the documentation it looks usable enough. PHP CouchDB Extension : From the listed options this looks like it has the best chance of making it into the PHP mainline itself, and also has the most complete API. Any opinion one wish to share on each library is welcome.

Read the article
Should I go vor Arrays or Objects in PHP in a CouchDB/Ajax app?

- by karlthorwald

I find myself converting between array and object all the time in PHP application that uses couchDB and Ajax. Of course I am also converting objects to JSON and back (for sometimes couchdb but mostly Ajax), but this is not so much disturbing my workflow. At the present I have php objects that are returned by the CouchDB modules I use and on the other hand I have the old habbit to return arrays like array("error"="not found","data"=$dataObj) from my functions. This leads to a mixed occurence of real php objects and nested arrays and I cast with (object) or (array) if necessary. The worst thing is that I know more or less by heart what a function returns, but not what type (array or object), so I often run into type errors. My plan is now to always cast arrays to objects before returning from a function. Of course this implies a lot of refactoring. Is this the right way to go? What about the conversion overhead? Other ideas or tips? Edit: Kenaniah's answer suggests I should go the other way, this would mean I'd cast everything to arrays. And for all the Ajax / JSON stuff and also for CouchDB I would use $myarray = json_decode($json_data,$assoc = false) Even more work to change all the CouchDB and Ajax functions but in the end I have better code.

Read the article
Is it really wrong to version documents using CouchDB's default behaviour?

- by Tomas Sedovic

This is one of those "I know I shouldn't do this but it's oh so convenient." questions. Sorry about that. I plan to use CouchDB for storing a bunch of documents and keeping their entire revision history. CouchDB does the versioning automatically, but it is strongly discouraged for programmer's use: "You cannot rely on document revisions for any other purpose than concurrency control." From what I've found on the CouchDB wiki, the versions can get deleted either during compaction or during replication. As far as I can tell, Compaction must always be triggered manually and Replication occurs only when there's more than one database server. The question is: if I won't run compaction and will use only single database instance for my documents, can I just use CouchDB's document versioning and expect it to work? What other problems I might run into? E.g. does not running compaction hurt the performance or consume significantly more disk space (than if I did handle the versioning manually)?

Read the article
Using ember-resource with couchdb - how can i save my documents?

- by Thomas Herrmann

I am implementing an application using ember.js and couchdb. I choose ember-resource as database access layer because it nicely supports nested JSON documents. Since couchdb uses the attribute _rev for optimistic locking in every document, this attribute has to be updated in my application after saving the data to the couchdb. My idea to implement this is to reload the data right after saving to the database and get the new _rev back with the rest of the document. Here is my code for this: // Since we use CouchDB, we have to make sure that we invalidate and re-fetch // every document right after saving it. CouchDB uses an optimistic locking // scheme based on the attribute "_rev" in the documents, so we reload it in // order to have the correct _rev value. didSave: function() { this._super.apply(this, arguments); this.forceReload(); }, // reload resource after save is done, expire to make reload really do something forceReload: function() { this.expire(); // Everything OK up to this location Ember.run.next(this, function() { this.fetch() // Sub-Document is reset here, and *not* refetched! .fail(function(error) { App.displayError(error); }) .done(function() { App.log("App.Resource.forceReload fetch done, got revision " + self.get('_rev')); }); }); } This works for most cases, but if i have a nested model, the sub-model is replaced with the old version of the data just before the fetch is executed! Interestingly enough, the correct (updated) data is stored in the database and the wrong (old) data is in the memory model after the fetch, although the _rev attribut is correct (as well as all attributes of the main object). Here is a part of my object definition: App.TaskDefinition = App.Resource.define({ url: App.dbPrefix + 'courseware', schema: { id: String, _rev: String, type: String, name: String, comment: String, task: { type: 'App.Task', nested: true } } }); App.Task = App.Resource.define({ schema: { id: String, title: String, description: String, startImmediate: Boolean, holdOnComment: Boolean, ..... // other attributes and sub-objects } }); Any ideas where the problem might be? Thank's a lot for any suggestion! Kind regards, Thomas

Read the article
Should I go for Arrays or Objects in PHP in a CouchDB/Ajax app?

- by karlthorwald

I find myself converting between array and object all the time in PHP application that uses couchDB and Ajax. Of course I am also converting objects to JSON and back (for sometimes couchdb but mostly Ajax), but this is not so much disturbing my workflow. At the present I have php objects that are returned by the CouchDB modules I use and on the other hand I have the old habbit to return arrays like array("error"="not found","data"=$dataObj) from my functions. This leads to a mixed occurence of real php objects and nested arrays and I cast with (object) or (array) if necessary. The worst thing is that I know more or less by heart what a function returns, but not what type (array or object), so I often run into type errors. My plan is now to always cast arrays to objects before returning from a function. Of course this implies a lot of refactoring. Is this the right way to go? What about the conversion overhead? Other ideas or tips? Edit: Kenaniah's answer suggests I should go the other way, this would mean I'd cast everything to arrays. And for all the Ajax / JSON stuff and also for CouchDB I would use $myarray = json_decode($json_data,$assoc = true); //EDIT: changed to true, whcih is what I really meant Even more work to change all the CouchDB and Ajax functions but in the end I have better code.

Read the article
What data is actually stored in a B-tree database in CouchDB?

- by Andrey Vlasovskikh

I'm wondering what is actually stored in a CouchDB database B-tree? The CouchDB: The Definitive Guide tells that a database B-tree is used for append-only operations and that a database is stored in a single B-tree (besides per-view B-trees). So I guess the data items that are appended to the database file are revisions of documents, not the whole documents: +---------|### ... | | +------|###|------+ ... ---+ | | | | +------+ +------+ +------+ +------+ | doc1 | | doc2 | | doc1 | ... | doc1 | | rev1 | | rev1 | | rev2 | | rev7 | +------+ +------+ +------+ +------+ Is it true? If it is true, then how the current revision of a document is determined based on such a B-tree? Doesn't it mean, that CouchDB needs a separate "view" database for indexing current revisions of documents to preserve O(log n) access? Wouldn't it lead to race conditions while building such an index? (as far as I know, CouchDB uses no write locks).

Read the article
Is it really wrong to version documents using CouchDB's behaviour?

- by Tomas Sedovic

This is one of those "I know I shouldn't do this but it's oh so convenient." questions. Sorry about that. I plan to use CouchDB for storing a bunch of documents and keeping their entire revision history. CouchDB does the versioning automatically, but it is strongly discouraged for programmer's use: "You cannot rely on document revisions for any other purpose than concurrency control." From what I've found on the CouchDB wiki, the versions can get deleted either during compaction or during replication. As far as I can tell, Compaction must always be triggered manually and Replication occurs only when there's more than one database server. The question is: if I won't run compaction and will use only single database instance for my documents, can I just use CouchDB's document versioning and expect it to work? What other problems I might run into? E.g. does not running compaction hurt the performance or consume significantly more disk space (than if I did handle the versioning manually)?

Read the article
Any reason why I shouldn't use couchdb for message passing or realtime activity streams?

- by Up

While using ampq or xmpp (rabbitmq or ejabbered that could have couchdb as backends) seems like a good fit to deliver real time updates about friend state in a social gaming platform where updates are small but frequent, I can't help but think why wouldn't couchdb be a good platform to deliver such updates? The main advantage I could think of is its ability to filter updates based on friends and availability of changes api, which makes developing such an application and managing it (including replication) quite easy compared to ampq or xmpp where you have to think about how to manage the pubsub nodes and who is subscribed to them at any point in time. However, I can't help but think this is too good to be true, I can't find information on what couchdb's shortcomings are. Somehow, it feels like using MySQL for message passing which is why I am hesitant to using it. Anyone have any experience in using couchdb for such applications? would you recommend another platform to use?

Read the article
Inspecting Lucene.NET index with Luke want to replicate NHibernate.Search view

- by Tim Peel

Hi, I am trying to put together an index using terms, which I specify as a comma separated list. I want to replicate the display in Luke as seen here: http://ayende.com/Blog/archive/2009/05/03/nhibernate-search-again.aspx But my index value just shows as a single field with the comma separate list value. For example: Tags term,anotherterm When I search my index, it will return results if I search with "term" but will not return anything if I search with "anotherterm" I thought the indexing process would break the comma separate list apart into separate values but this does not seem to be the case. Anyone got any ideas? Thanks

Read the article
Why is my Lucene index getting locked?

- by Andrew Bullock

I had an issue with my search not return the results I expect. I tried to run Luke on my index, but it said it was locked and I needed to Force Unlock it (I'm not a Jedi/Sith though) I tried to delete the index folder and run my recreate-indicies application but the folder was locked. Using unlocker I've found that there are about 100 entries of w3wp.exe (same PID, different Handle) with a lock on the index. Whats going on? I'm doing this in my NHibernate configuration: c.SetListener(ListenerType.PostUpdate, new FullTextIndexEventListener()); c.SetListener(ListenerType.PostInsert, new FullTextIndexEventListener()); c.SetListener(ListenerType.PostDelete, new FullTextIndexEventListener()); And here is the only place i query the index: var fullTextSession = NHibernate.Search.Search.CreateFullTextSession(this.unitOfWork.Session); var fullTextQuery = fullTextSession.CreateFullTextQuery(query, typeof (Person)); fullTextQuery.SetMaxResults(100); return fullTextQuery.List<Person>(); Whats going on? What am i doing wrong? Thanks

Read the article
lucene.net get starting and end index of a highlighted fragment in a searched field

- by user339995

"My search returns a highlighted fragment from a field. I want to know that in that field of particular searched document, where does that fragment starts and ends ?" for instance. consider i am searching "highlighted fragment" in above lines (consider the above para as single document). I am setting my fragmenter as : SimpleFragmenter fragmenter = new SimpleFragmenter(30); now the output of GetBestFragment is somewhat like : "returns a highlighted fragment from" Is it possible to get the starting and ending index of this fragment in the text above (say starting is 10 and ending is 45)

Read the article
Lucene.NET faceted search.

- by Paul Knopf

I found a great tutorial on performing a faceted search. http://www.devatwork.nl/articles/lucenenet/faceted-search-and-drill-down-lucenenet/ This article does not explain how to retrieve the narrowed available attributes to filter from (for further drill down). Lets say I am looking for planners that are red. When I perform the faceted search, I want to return all available attributes to filter from that are red. Then when I add a "weekly format" filter, I want the attribute list to get even smaller, containing only filters available for the segmented group.

Read the article
Which can handle a huge surge of queries: SQL Server 2008 Fulltext or Lucene

- by Luke101

I am creating a widget that will be installed on several websites and blogs. The widget will analyse the remote webpage title and content, then it will return relevent articles/links on my website. The amount of traffic we expect will be very huge roughly 500K queries a day and up from there. I need the queries to be returned very quickly, so I need the candidate to be high performance, similar to google adsense. The remote title can be from 5 to 50 words and the description we will use no more then 3000 words. Which of these two do you think can handle the load.

Read the article
Lucene.Net memory consumption and slow search when too many clauses used

- by Umer

I have a DB having text file attributes and text file primary key IDs and indexed around 1 million text files along with their IDs (primary keys in DB). Now, I am searching at two levels. First is straight forward DB search, where i get primary keys as result (roughly 2 or 3 million IDs) Then i make a Boolean query for instance as following +Text:"test*" +(pkID:1 pkID:4 pkID:100 pkID:115 pkID:1041 .... ) and search it in my Index file. The problem is that such query (having 2 million clauses) takes toooooo much time to give result and consumes reallly too much memory.... Is there any optimization solution for this problem ?

Read the article
Lucene neo4j sort with boolean fields

- by Daniele

I have indexed some documents (nodes of neo4j) with a boolean property which not always is present. Eg. Node1 label : "label A" Node2: label : "label A" (note, same label of node1) special : true The goal is to get Node2 higher than node 1 for query "label A". Here the code: Index<Node> fulltextLucene = graphDb.index().forNodes( "my-index" ); Sort sort = new Sort(new SortField[] {SortField.FIELD_SCORE, new SortField("special", SortField.????, true) }); IndexHits<Node> results = fulltextLucene.query( "label", new QueryContext( "label A").sort(sort)); How can I accomplish that? Thanks

Read the article
Solr/Lucene Scorer

- by TFor

We are currently working on a proof-of-concept for a client using Solr and have been able to configure all the features they want except the scoring. Problem is that they want scores that make results fall in buckets: Bucket 1: exact match on category (score = 4) Bucket 2: exact match on name (score = 3) Bucket 3: partial match on category (score = 2) Bucket 4: partial match on name (score = 1) First thing we did was develop a custom similarity class that would return the correct score depending on the field and an exact or partial match. The only problem now is that when a document matches on both the category and name the scores are added together. Example: searching for "restaurant" returns documents in the category restaurant that also have the word restaurant in their name and thus get a score of 5 (4+1) but they should only get 4. I assume for this to work we would need to develop a custom Scorer class but we have no clue on how to incorporate this in Solr. Another option is to create a custom SortField implementation similar to the RandomSortField already present in Solr. Maybe there is even a simpler solution that we don't know about. All suggestions welcome!

Read the article
A little off topic, but can anyone recommend examples where lucene is used on live websites

- by Phelan

I know wikipedia uses it but I am looking for more product based websites. Thanks.

Read the article
Lucene DuplicateFilter question

- by chardex

Hi, Why DuplicateFilter doesn't work together with other filters? For example, if a little remake of the test DuplicateFilterTest, then the impression that the filter is not applied to other filters and first trims results: public void testKeepsLastFilter() throws Throwable { DuplicateFilter df = new DuplicateFilter(KEY_FIELD); df.setKeepMode(DuplicateFilter.KM_USE_LAST_OCCURRENCE); Query q = new ConstantScoreQuery(new ChainedFilter(new Filter[]{ new QueryWrapperFilter(tq), // new QueryWrapperFilter(new TermQuery(new Term("text", "out"))), // works right, it is the last document. new QueryWrapperFilter(new TermQuery(new Term("text", "now"))) // why it doesn't work? It is the third document. }, ChainedFilter.AND)); ScoreDoc[] hits = searcher.search(q, df, 1000).scoreDocs; assertTrue("Filtered searching should have found some matches", hits.length > 0); for (int i = 0; i < hits.length; i++) { Document d = searcher.doc(hits[i].doc); String url = d.get(KEY_FIELD); TermDocs td = reader.termDocs(new Term(KEY_FIELD, url)); int lastDoc = 0; while (td.next()) { lastDoc = td.doc(); } assertEquals("Duplicate urls should return last doc", lastDoc, hits[i].doc); } }

Read the article
Writing Lucene StandardAnalyzer results to text file with OutputStreamWriter

- by user3693192

I'm getting ONLY the last result written to "outputStreamFile.txt". Can't figure out how to revise code so I can get ALL results written to text file. Sample input text: "1st line of text\n" "2nd line of text \n" Results in only 2nd line begin written (and not 1st line) as: "2nd line text\n" private static void analyze(String text) throws IOException { analyzer = new StandardAnalyzer(Version.LUCENE_30); Reader r = new StringReader(text); TokenStream ts = (TokenStream) analyzer.tokenStream("", r); TermAttribute term = ts.addAttribute(TermAttribute.class); File outfile = new File("C:\\Users\\Desktop\\outputStreamFile.txt"); FileOutputStream fileOutputStream = new FileOutputStream(outfile); OutputStreamWriter outputStreamWriter = new OutputStreamWriter(fileOutputStream, "UTF8"); while(ts.incrementToken()) { //System.out.print(term.term() + " "); outputStreamWriter.write(term.term().toString() + "\r\n"); } outputStreamWriter.close(); }

Read the article

< Previous Page | 6 7 8 9 10 11 12 13 14 15 16 17 | Next Page >