Search Results

Search found 631 results on 26 pages for 'couchdb lucene'.

Page 11/26 | < Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18 | Next Page >

Best cross-language analyzer to use with lucene index

- by Halirob

Hello, I'm looking for feedback on which analyzer to use with an index that has documents from multiple languages. Currently I am using the simpleanalyzer, as it seems to handle the broadest amount of languages. Most of the documents to be indexed will be english, but there will be the occasional double-byte language indexed as well. Are there any other suggestions or should I just stick with the simpleanalyzer. Thanks

Read the article
Lucene search on specific field name?

- by Rachel

I have been playing around with an installation of SOLR that indexes some data from my database. I am able to index data and query it back but I was wondering about how field name queries work. For certain fields I am able to specify their name and the search text to have the results return as expected and for other fields, when I specify their name and search text, no results are returned. q=type:book //(this will work) q=type:book AND title:"The Title" //(no results returned) In this example, type is a required field and title is not. For the example where I search by title, I can see the document in the results of the first query having the given title so I know that a document exists that matches this search. Is making a field 'required' the only way to be able to search by field name? [edit] I'm using the default installation and the 'example' folder inside of solr, editing the xml files and using the interface available through start.jar to be able to run, index and query.

Read the article
Reverse search in Hibernate Search

- by Javi

Hello, I'm using Hibernate Search (which uses Lucene) for searching some Data I have indexed in a directory. It works fine but I need to do a reverse search. By reverse search I mean that I have a list of queries stored in my database I need to check which one of these queries match with a Data object each time Data Object is created. I need it to alert the user when a Data Object matches with a Query he has created. So I need to index this single Data Object which has just been created and see which queries of my list has this object as a result. I've seen Lucene MemoryIndex Class to create an index in memory so I can do something like this example for every query in a list (though iterating in a Java list of queries would not be very efficient): //Iterating over my list<Query> MemoryIndex index = new MemoryIndex(); //Add all fields index.addField("myField", "myFieldData", analyzer); ... QueryParser parser = new QueryParser("myField", analyzer); float score = index.search(query); if (score > 0.0f) { System.out.println("it's a match"); } else { System.out.println("no match found"); } The problem here is that this Data Class has several Hibernate Search Annotations @Field,@IndexedEmbedded,... which indicated how fields should be indexed, so when I invoke index() method on the FullTextEntityManager instance it uses this information to index the object in the directory. Is there a similar way to index it in memory using this information? Is there a more efficient way of doing this reverse search? Thanks

Read the article
FastVectorHighlighter.Net returning null on GetBestFragment

- by Midhat

Hi I have a large index, on which Highlighter.Net works fine, but FastVectorHighlighter returns null as a Best Fragment on Some documents. the searcher works fine. It is just the highlighter. The field has been indexed in the same manner for all documents, so I fail to understand Why it highlights some documents but not all. Using Lucene.Net 2.9.2, built from trunk rev942061

Read the article
How to get a Token from a Lucene TokenStream?

- by FarmBoy

I'm trying to use Apache Lucene for tokenizing, and I am baffled at the process to obtain Tokens from a TokenStream. The worst part is that I'm looking at the comments in the JavaDocs that address my question. http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/analysis/TokenStream.html#incrementToken%28%29 Somehow, an AttributeSource is supposed to be used, rather than Tokens. I'm totally at a loss. Can anyone explain how to get token-like information from a TokenStream?

Read the article
How to setup Lucene search for a B2B web app?

- by Bill Paetzke

Given: 5000 databases (spread out over 5 servers) 1 database per client (so you can infer there are 1000 clients) 2 to 2000 users per client (let's say avg is 100 users per client) Clients (databases) come and go every day (let's assume most remain for at least one year) Let's stay agnostic of language or sql brand, since Lucene (and Solr) have a breadth of support The Question: How would you setup Lucene search so that each client can only search within its database? How would you setup the index(es)? Would you need to add a filter to all search queries? If a client cancelled, how would you delete their (part of the) index? (this may be trivial--not sure yet) Possible Solutions: Make an index for each client (database) Pro: Search is faster (than one-index-for-all method). Indices are relative to the size of the client's data. Con: I'm not sure what this entails, nor do I know if this is beyond Lucene's scope. Have a single, gigantic index with a database_name field. Always include database_name as a filter. Pro: Not sure. Maybe good for tech support or billing dept to search all databases for info. Con: Search is slower (than index-per-client method). Flawed security if query filter removed. For Example: Joel Spolsky said in Podcast #11 that his hosted web app product, FogBugz On-Demand, uses Lucene. He has thousands of on-demand clients. And each client gets their own database. His situation is quite similar to mine. Although, he didn't elaborate on the setup (particularly indices); hence, the need for this question. One last thing: I would also accept an answer that uses Solr (the extension of Lucene). Perhaps it's better suited for this problem. Not sure.

Read the article
Writing Web "server less" applications

- by crodjer

TL;DR What are the prospects of write applications which are completely based on a REST database server (CouchDB) and web applications which directly access the DB instead of having a web server in between? I recently started looking up some NoSQL databases. MongoDB seems to be a popular choices. I also liked the project. But I personally liked the REST interface of CouchDB. So what I wanted to know is if there was the possibility of applications (maybe cached apps in web browser, a chrome extension etc.) which could just just query the database directly with no requirement of a webserver in between. All the computational logic would reside in the client application and the database will do what it does, CRUD. Since mostly (I don't know which doesn't) client frameworks support REST quaries, it could be a good way writing applications well optimized for respective framework. These applications though won't be doing complicated computation, but still provide enough functionality which could replace lots of conventional applications. Are existing resources and projects which would help me move towards writing such applications and also the scope and moving towards developing in this way? Are their any technical/security issues with this? This post will help me decide to look into project like CouchDB (and maybe Dive into Erlang later) or stay with the conventional frameworks (like django) and SQL databases. Update A specific point of such apps I had in mind is creation of offline applications just by replicating couchdb data on client.

Read the article
What is the advantage of Lucene searching and indexing ?

- by Mehdi Amrollahi

I want to know , What is the advantage of Lucene searching and indexing ? Is searching with Lucene as fast as other searching algorithm like Quick Search? What about indexing ? I want to know more about advantage of Lucene rather that others . thanks .

Read the article
How to setup Lucene/Solr for a B2B web app?

- by Bill Paetzke

Given: 1 database per client (business customer) 5000 clients Clients have between 2 to 2000 users (avg is ~100 users/client) 100k to 10 million records per database Users need to search those records often (it's the best way to navigate their data) Possibly relevant info: Several new clients each week (any time during business hours) Multiple web servers and database servers (users can login via any web server) Let's stay agnostic of language or sql brand, since Lucene (and Solr) have a breadth of support For Example: Joel Spolsky said in Podcast #11 that his hosted web app product, FogBugz On-Demand, uses Lucene. He has thousands of on-demand clients. And each client gets their own database. They use an index per client and store it in the client's database. I'm not sure on the details. And I'm not sure if this is a serious mod to Lucene. The Question: How would you setup Lucene search so that each client can only search within its database? How would you setup the index(es)? Where do you store the index(es)? Would you need to add a filter to all search queries? If a client cancelled, how would you delete their (part of the) index? (this may be trivial--not sure yet) Possible Solutions: Make an index for each client (database) Pro: Search is faster (than one-index-for-all method). Indices are relative to the size of the client's data. Con: I'm not sure what this entails, nor do I know if this is beyond Lucene's scope. Have a single, gigantic index with a database_name field. Always include database_name as a filter. Pro: Not sure. Maybe good for tech support or billing dept to search all databases for info. Con: Search is slower (than index-per-client method). Flawed security if query filter removed. One last thing: I would also accept an answer that uses Solr (the extension of Lucene). Perhaps it's better suited for this problem. Not sure.

Read the article
Extending / changing how Zend_Search_Lucene searches

- by Grant Collins

Hi, I am currently using Zend_Search_Lucene to index and search a number of documents currently at around a 1000 or so. What I would like to do is change how the engine scores hits on a document, from the current default. Zend_Search_Lucene scores on the frequency of number of hits within a document, so a document that has 10 matches of the word PHP will score higher than a document with only 3 matches of PHP. What I am trying to do is pass a number of key words and score depending on the hits of those keywords. e.g. I pass 5 key words say,PHP, MySQL, Javascript, HTML and CSS that I search against the index. One document has 3 matches to those key words and one document has all 4 matches, the 4 matches scores the highest. The number of instances of those words in the document do not concern me. Now I've had a quick look at Zend_Search_Lucene_Search_Similarity however I have to confess that I am not sure (or that bright) to know how to use this to achieve what I am after. Is what I want to do possible using Lucene or is there a better solution out there?

Read the article
Lucene best practice

- by Dragos

I am trying to understand how Lucene should be used. From what I have read, creating an IndexReader is costly, so using a Search Manager shoulg be the right choice. However, a SearchManager should be produced by a NRTManager(which, by the way, should replace the IndexWriter for every add or delete operation performed). But in order to have a NRTManager, I should first have an IndexWriter, and here comes my problem. The documentation says: an IndexWriter is thread-safe the constructor of this class takes a Directory object, so it seems creating an instace should be costly(as in the case of an IndexReader) all changes are buffered and flushed periodically(so they seem to encourage using a single instance) but: the changes, although flushed will only be visible after commit or close after finished making updates(add/delete), the instance should be closed I also found this: http://stackoverflow.com/questions/5374419/forgot-to-close-the-lucene-indexwriter-after-adding-documents-to-the-index where it is said that not closing a writer might ruin everything So what am I really supposed to do? Is having a single IndexWriter instance a good idea(make only commit and never close it)? EDIT: What is more, if I use NRTManager, how can I make acommit`? Is it even possible?

Read the article
What blogs/websites are the best central source of information on CouchDB?

- by Krystof

I'm trying to find information about CouchDB and it seems quite sparse or not centralized at least. What blogs/websites are the best central source of information on CouchDB? Are there any books on the subject?

Read the article
How does CouchDB perform for a regularly updated dataset?

- by Ritesh M Nayak

I am planning on using CouchDB on a project. But as the querying mechanism involves writing views (which are a lot like indexes on regular RDMBMS's) I was wondering, if the document database keeps getting updated a lot ( a write heavy database) would CouchDB perform well compared to a regular RDBMS? Or do we have to compact/re-index the system occasionally to make it perform faster?

Read the article
When is ( and isn' t ) recommended a Documente Oriented DB ( like mongodb or couchdb ) ?

- by xRobot

I have always used postgresql or mysql but I have heard a lot about document oriented db like mongodb and couchdb... So my question is... When is ( and isn' t ) recommended a Document Oriented DB ( like mongodb or couchdb ) ?

Read the article
Pylucene in Python 2.6 + MacOs Snow Leopard

- by jbastos

Greetings, I'm trying to install Pylucene on my 32-bit python running on Snow Leopard. I compiled JCC with success. But I get warnings while making pylucene: ld: warning: in build/temp.macosx-10.6-i386-2.6/build/_lucene/__init__.o, file is not of required architecture ld: warning: in build/temp.macosx-10.6-i386-2.6/build/_lucene/__wrap01__.o, file is not of required architecture ld: warning: in build/temp.macosx-10.6-i386-2.6/build/_lucene/__wrap02__.o, file is not of required architecture ld: warning: in build/temp.macosx-10.6-i386-2.6/build/_lucene/__wrap03__.o, file is not of required architecture ld: warning: in build/temp.macosx-10.6-i386-2.6/build/_lucene/functions.o, file is not of required architecture ld: warning: in build/temp.macosx-10.6-i386-2.6/build/_lucene/JArray.o, file is not of required architecture ld: warning: in build/temp.macosx-10.6-i386-2.6/build/_lucene/JObject.o, file is not of required architecture ld: warning: in build/temp.macosx-10.6-i386-2.6/build/_lucene/lucene.o, file is not of required architecture ld: warning: in build/temp.macosx-10.6-i386-2.6/build/_lucene/types.o, file is not of required architecture ld: warning: in /Developer/SDKs/MacOSX10.4u.sdk/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/JCC-2.3-py2.6-macosx-10.3-fat.egg/libjcc.dylib, file is not of required architecture ld: warning: in /Developer/SDKs/MacOSX10.4u.sdk/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/JCC-2.3-py2.6-macosx-10.3-fat.egg/libjcc.dylib, file is not of required architecture build of complete Then I try to import lucene: MacBookPro:~/tmp/trunk python Python 2.6.3 (r263:75184, Oct 2 2009, 07:56:03) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import pylucene Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: No module named pylucene >>> import lucene Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/lucene-2.9.0-py2.6-macosx-10.6-i386.egg/lucene/__init__.py", line 7, in <module> import _lucene ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/lucene-2.9.0-py2.6-macosx-10.6-i386.egg/lucene/_lucene.so, 2): Symbol not found: __Z8getVMEnvP7_object Referenced from: /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/lucene-2.9.0-py2.6-macosx-10.6-i386.egg/lucene/_lucene.so Expected in: flat namespace in /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/lucene-2.9.0-py2.6-macosx-10.6-i386.egg/lucene/_lucene.so >>> Any hints?

Read the article
NoSQL CouchDB Getting Stable with New Release

<b>Database Journal:</b> "The open source Apache CouchDB database project hit a major milestone this week with the release of version 0.11. The release is an important one for the NoSQL database variant as it matures toward its 1.0 release."

Read the article
Open Source CouchDB Heads to the Cloud

<b>Developer.com:</b> "SQL-based relational database management systems (RDBMS) are beginning to be challenged by a new movement of NoSQL databases. Among those NoSQL databases is the open source CouchDB..."

Read the article
CouchDB Moves to the Cloud With Couchio

<b>Database Journal:</b> "According to its motto, the underlying premise behind the open source CouchDB NoSQL database is about helping developers "relax" -- chiefly by providing them with a simple, powerful database alternative."

Read the article
How Should I Generate Trade Statistics For CouchDB/Rails3 Application?

- by James

My Problem: I am trying to developing a web application for currency traders. The application allows traders to enter or upload information about their trades and I want to calculate a wide variety of statistics based on what the user entered. Now, normally I would use a relational database for this, but I have two requirements that don't fit well with a relational database so I am attempting to use couchdb. Those two problems are: 1) Primarily, I have a companion desktop application that users will be able to work with and replicate to the site using couchdb's awesome replication feature and 2) I would like to allow users to be able to define their own custom things to track about trades and generate results based off of what they enter. The schema less nature of couch seems perfect here, but it may end up being harder than it sounds. (I already know couch requires you to define views in advance and such so I was just planning on sticking all the custom attributes in an array and then emitting the array in the view and further processing from there.) What I Am Doing: Right now I am just emitting each trade in couch keyed by each user's system and querying with the key of the system to get an array of trades per system. Simple. I am not using a reduce function currently to calculate any stats because I couldn't figure out how to get everything I need without getting a reduce overflow error. Here is an example of rows that are getting emitted from couch: {"total_rows":134,"offset":0,"rows":[ {"id":"5b1dcd47221e160d8721feee4ccc64be", "key":["80e40ba2fa43589d57ec3f1d19db41e6","2010/05/14 04:32:37 +0000"], null, "doc":{ "_id":"5b1dcd47221e160d8721feee4ccc64be", "_rev":"1-bc9fe763e2637694df47d6f5efb58e5b", "couchrest-type":"Trade", "system":"80e40ba2fa43589d57ec3f1d19db41e6", "pair":"EUR/USD", "direction":"Buy", "entry":12600, "exit":12700, "stop_loss":12500, "profit_target":12700, "status":"Closed", "slug":"101332132375", "custom_tracking": [{"name":"signal", "value":"Pin Bar"}] "updated_at":"2010/05/14 04:32:37 +0000", "created_at":"2010/05/14 04:32:37 +0000", "result":100}} ]} In my rails 3 controller I am basically just populating an array of trades such as the one above and then extracting out the relevant data into smaller arrays that I can compute my statistics on. Here is my show action for the page that I want to display the stats and all the trades: def show @trades = Trade.by_system(:startkey => [@system.id], :endkey => [@system.id, Time.now ]) @trades.each do |trade| if trade.result > 0 @winning_trades << trade.result elsif trade.result < 0 @losing_trades << trade.result else @breakeven_trades << trade.result end if trade.direction == "Buy" @long_trades << trade.result else @short_trades << trade.result end if trade["custom_tracking"] != nil @custom_tracking << {"result" => trade.result, "variables" => trade["custom_tracking"]} end end end I am omitting some other stuff that is going on, but that is the gist of what I am doing. Then I am calculating stuff in the view layer to produce some results: <% winning_long_trades = @long_trades.reject {|trade| trade <= 0 } %> <% winning_short_trades = @short_trades.reject {|trade| trade <= 0 } %> <ul> <li>Total Trades: <%= @trades.count %></li> <li>Winners: <%= @winning_trades.size %></li> <li>Biggest Winner (Pips): <%= @winning_trades.max %></li> <li>Average Win(Pips): <%= @winning_trades.sum/@winning_trades.size %></li> <li>Losers: <%= @losing_trades.size %></li> <li>Biggest Loser (Pips): <%= @losing_trades.min %></li> <li>Average Loss(Pips): <%= @losing_trades.sum/@losing_trades.size %></li> <li>Breakeven Trades: <%= @breakeven_trades.size %></li> <li>Long Trades: <%= @long_trades.size %></li> <li>Winning Long Trades: <%= winning_long_trades.size %></li> <li>Short Trades: <%= @short_trades.size %></li> <li>Winning Short Trades: <%= winning_short_trades.size %></li> <li>Total Pips: <%= @winning_trades.sum + @losing_trades.sum %></li> <li>Win Rate (%): <%= @winning_trades.size/@trades.count.to_f * 100 %></li> </ul> This produces the following results, which aside from a few things is exactly what I want: Total Trades: 134 Winners: 70 Biggest Winner (Pips): 1488 Average Win(Pips): 440 Losers: 58 Biggest Loser (Pips): -516 Average Loss(Pips): -225 Breakeven Trades: 6 Long Trades: 125 Winning Long Trades: 67 Short Trades: 9 Winning Short Trades: 3 Total Pips: 17819 Win Rate (%): 52.23880597014925 What I Am Wondering- Finally The Actual Questions: I am starting to get really skeptical of how well this method will work when a user has 5,000 trades instead of just 134 like in this example. I anticipate most users will only have somewhere under 200 per year, but some users may have a couple thousand trades per year. Probably no more than 5,000 per year. It seems to work ok now, but the page load times are already getting a tad high for my tastes. (About 800ms to generate the page according to rails logs with about a 250ms of that spent in the view layer.) I will end up caching this page I am sure, but I still need the regenerate the page each time a trade is updated and I can't afford to have this be too slow. Sooo..... Is doing something similar here possible with a straight couchdb reduce function? I am assuming handing this off to couch would possibly help with larger data sets. I couldn't figure out how, but I suppose that doesn't mean it isn't possible. If possible, any hints will be helpful. Could I use a list function if a reduce was not available due to reduce constraints? Are couchdb list functions suitable for this type of calculations? Anyone have any idea of whether or not list functions perform well? Any hints what one would look like for the type of calculations I am trying to achieve? I thought about other options such as running the calculations at the time each trade was saved or nightly if I had to and saving the results to a statistics doc that I could then query so that all the processing was done ahead of time. I would like this to be the last resort because then I can't really filter out trades by time periods dynamically like I would really like to. (I want to have a slider that a user can slide to only show trades from that time period using the startkey and endkey in couchdb if I can.) If I should continue running the calculations inside the rails app at the time of the page view, what can I do to improve my current implementation. I am new to rails, couch and programming in general. I am sure that I could be doing something better here. Do I need to create an array for each stat or is there a better way to do that. I guess I just would really like some advice on how to tackle this problem. I want to keep the page generation time minimal since I anticipate these being some of the highest trafficked pages. My gut is that I will need to offload the statistics calculation to either couch or run the stats in advance of when they are called, but I am not sure. Lastly: Like I mentioned above, one of the primary reasons for using couch is to allow users to define their own things to track per trade. Getting the data into couch is no problem, but how would I be able to take the custom_tracking array and find how many winning trades for each named tracking attribute. If anyone can give me any hints to the possibility of doing this that would be great. Thanks a bunch. Would really appreciate any help. Willing to fork out some $$$ if someone wants to take on the problem for me. (Don't know if that is allowed on stack overflow or not.)

Read the article
Zend_Search_Lucene and range search

- by ranza

I have a bunch of int key fields in my index and trying to do a simple range search like this: `gender:1 AND height:[120 TO 180]` This should give me male in the height range 120 to 180. But for some reason i get this exception: `At least one range query boundary term must be non-empty term` How would i debug this? Is it just Zend_Search_Lucene being buggy?

Read the article
Setting wildcard queries as default for QueryParser

- by user46703

When my users enter a term like "word" I would like it be treated as a wildcard query "word*" so all terms beginning "word" are found. Is there a way to tell the QueryParser to automatically create wildcard queries or do I have to parse the query myself? This shouldn't be a problem for simple queries but it may become tricky for more complex queries.

Read the article
How to use NGramTokenizerFactory or NGramFilterFactory?

- by user572485

Hi, Recently, I am studying how to store and index using Solr. I want to do facet.prefix search. With whitespace tokenizer, "Where are you" will be splited into three words and indexed. If I search facet.prefix="where are", no result will be returned. I google and found NGramFilterFactory can help me. But when I apply this filter factory, I found the result is "w, h, e, ..., wh, ..", which split the sentence by character, not by token word. I use the parameters maxGramSize and minGramSize, set to 1 and 3. Does the NGramFilterFactory work right? Should I add some other parameters? Is there some other filter factories which can help me? Thanks!

Read the article
Refining Solr searches, getting exact matches?

- by thebluefox

Afternoon chaps, Right, I'm constructing a fairly complex (to me anyway) search system for a website using Solr, although this question is quite simple I think... I have two search criteria, location and type. I want to return results that are exact matches to type (letter to letter, no exceptions), and like location. My current search query is as follows ../select/?q=location:N1 type:blue&rows=100&fl=*,score&debugQuery=true This firstly returns all the type blue's that match N1, but then returns any type that matches N1, which is opposite to what I'm after. Both fields are set as textgen in the Solr schema. Any pointers? Cheers gang

Read the article
How to optimize indexing of large number of DB records using Zend_Lucene and Zend_Paginator

- by jdichev

So I have this cron script that is deployed and ran using Cron on a host and indexes all the records in a database table - the index is later used both for the front end of the site and the backed operations as well. After the operation, the index is about 3-4 MB. The problem is it takes a lot of resources (CPU: 30+ and a good chunk of memory) and slows the machine down. My question is about how to optimize the operation described below: First there is a select query built using the Zend Framework API, this query is then passed to a Paginator factory that returns a paginator which I am using to balance the current number of items being indexed and not iterate over too much items. The script is iterating over the current items in the paginator object using a foreach loop until reaching the end and then it starts from the beginning after getting items for the next page. I am suspecting this overhead is caused by the Zend_Lucene but no idea how this could be improved.

Read the article
Is it possible for lucene to store the index only in one file and no other assisting files shall be

- by Akhil

When we create a lucene index, various files are created. If we donot optimize Index writer three files are created, one named _0.cfs which contains all of the index data and two other files conataining meta data. Is it possible to force lucene to create only one file instead of three.

Read the article

< Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18 | Next Page >