Search Results

Search found 631 results on 26 pages for 'couchdb lucene'.

Page 2/26 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

Lucene (.NET) Document stucture and performance suggestions.

- by Josh Handel

Hello, I am indexing about 100M documents that consist of a few string identifiers and a hundred or so numaric terms.. I won't be doing range queries, so I haven't dugg too deep into Numaric Field but I'm not thinking its the right choose here. My problem is that the query performance degrades quickly when I start adding OR criteria to my query.. All my queries are on specific numaric terms.. So a document looks like StringField:[someString] and N DataField:[someNumber].. I then query it with something like DataField:((+1 +(2 3)) (+75 +(3 5 52)) (+99 +88 +(102 155 199))). Currently these queries take about 7 to 16 seconds to run on my laptop.. I would like to make sure thats really the best they can do.. I am open to suggestions on field structure and query structure :-). Thanks Josh PS: I have already read over all the other lucene performance discussions on here, and on the Lucene wiki and at lucid imiagination... I'm a bit further down the rabbit hole then that...

Read the article
Lucene search and underscores

- by Matt

When I use Luke to search my Lucene index using a standard analyzer, I can see the field I am searchng for contains values of the form MY_VALUE. When I search for field:"MY_VALUE" however, the query is parsed as field:"my value" Is there a simple way to escape the underscore (_) character so that it will search for it?

Read the article
How do i search 'and' with lucene?

- by acidzombie24

I am looking at the query syntax. and i could not figure out how to search 'and'. I tried "a sentence with and and words after it" i tried +and and \and. It always ignored it. How can i search 'and'? I am using lucene.net

Read the article
Lucene.Net Search result to highlight search keywords

- by Ali Shafai

I use Lucene.Net to index some documents. I want to show the user a couple of lines as to why that document is in the result set. just like when you use google to search and it shows the link and followed by the link there are a few lines with the keywords highlighted. any ideas?

Read the article
Lucene.Net support phrases?: What is best approach to tokenize comma-delimited data (atomically) in

- by Pete Alvin

I have a database with a column I wish to index that has comma-delimited names, e.g., User.FullNameList = "Helen Ready, Phil Collins, Brad Paisley" I prefer to tokenize each name atomically (name as a whole searchable entity). What is the best approach for this? Did I miss a simple option to set the tokenize delimiter? Do I have to subclass or write my own class that to roll my own tokenizer? Something else? ;) Or does Lucene.net not support phrases? Or is it smart enough to handle this use case automatically? I'm sure I'm not the first person to have to do this. Googling produced no noticeable solutions.

Read the article
Reading from compressed lucene index

- by Akhil

I created a lucene index and compressed the index directory with bz2 or zip. I donot want to uncompress it. Is there any API call that can read the index from this zipped directory and thus allow searching and other functionalities. That is, can lucence IndexReader read the index from a compressed file. I saw that Lucnene IndexReader does not support "Reader" to open the index, otherwise I would have created a Reader class that uncompresses the file and streams the uncompressed version. Any alternatives to this are welcome. Thanks, Akhil

Read the article
Hyphens in Lucene

- by user72185

Hi, I'm playing around with Lucene and noticed that the use of a hyphen (e.g. "semi-final") will result in two words ("semi" and "final" in the index. How is this supposed to match if the users searches for "semifinal", in one word? Edit: I'm just playing around with the StandardTokenizer class actually, maybe that is why? Am I missing a filter? Thanks! (Edit) My code looks like this: StandardAnalyzer sa = new StandardAnalyzer(); TokenStream ts = sa.TokenStream("field", new StringReader("semi-final")); while (ts.IncrementToken()) { string t = ts.ToString(); Console.WriteLine("Token: " + t); }

Read the article
Disabling scoring in Lucene(.NET)

- by user72185

Hi, When searching, is there a way to disable scoring for any query? The scenario is that the user refines his query by trying different combinations of words, phrases etc., and needs realtime (well, reasonably fast at least) responses on the number of hits. Search time slows down a lot when there are millions of hits due to scoring, but the user really doesn't care about all these documents. As soon as he sees there are 1M+ hits he will start adding additional words to the query. A "Sort by relevance" option would allow him to do this quickly, while turning scoring back on when the number of hits is reasonable. Is this possible? I'm using Lucene.NET 2.9.2 but AFAIK it is identical to the Java version.

Read the article
Using Lucene to Query File properties in Windows

- by sneha

Hi All, I am planning to use Apache lucense in one of my projects, I want to index files based on the file properties (I won’t be indexing the data) and I want lucense to query the index so that I can quickly find list of files to based on the properties . E.g: give me all the files with access time greater than 10/10/2005 and access time less than 10/04/2010 and file created by james. Can i use Lucene for these kind of projects ? or i better of using windows search (the foor print is very heavy almost 5 MB :( ) and i have to bundling this as part of my application is seems to tough. Can you please suggest is there any better alternatives here?

Read the article
Indexing large DB's with Lucene/PHP

- by thebluefox

Afternoon chaps, Trying to index a 1.7million row table with the Zend port of Lucene. On small tests of a few thousand rows its worked perfectly, but as soon as I try and up the rows to a few tens of thousands, it times out. Obviously, I could increase the time php allows the script to run, but seeing as 360 seconds gets me ~10,000 rows, I'd hate to think how many seconds it'd take to do 1.7million. I've also tried making the script run a few thousand, refresh, and then run the next few thousand, but doing this clears the index each time. Any ideas guys? Thanks :)

Read the article
My Lucene queries only ever find one hit

- by Bob

I'm getting started with Lucene.Net (stuck on version 2.3.1). I add sample documents with this: Dim indexWriter = New IndexWriter(indexDir, New Standard.StandardAnalyzer(), True) Dim doc = Document() doc.Add(New Field("Title", "foo", Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.NO)) doc.Add(New Field("Date", DateTime.UtcNow.ToString, Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.NO)) indexWriter.AddDocument(doc) indexWriter.Close() I search for documents matching "foo" with this: Dim searcher = New IndexSearcher(indexDir) Dim parser = New QueryParser("Title", New StandardAnalyzer()) Dim Query = parser.Parse("foo") Dim hits = searcher.Search(Query) Console.WriteLine("Number of hits = " + hits.Length.ToString) No matter how many times I run this, I only ever get one result. Any ideas?

Read the article
Lucene .NET IndexWriter lock

- by Pini Salim

My question related to the next code snippet: static void Main(string[] args) { Lucene.Net.Store.Directory d = FSDirectory.Open(new DirectoryInfo(/*my index path*/)); IndexWriter writer = new IndexWriter(d, new WhitespaceAnalyzer()); //Exiting without closing the indexd writer... } In this test, I opened an IndexWriter without closing it - so even after the test exits, the write.lock file still exists in the index directory, so I expected that the next time I open an instance of IndexWriter to that index, a LockObatinFailedException will be thrown. Can someone please explain to me why am I wrong? I mean, does the meaning of the write.lock file is to protect creation of two IndexWriters in the same process only? that doesnt seems the right answer to me...

Read the article
Lucene .Net Searching with TermVector

- by Ashish

in Lucene.Net,i am creating the document for searching a word and want to display before 10 words and after 10 words.i have used TermVector. Lucene.Net.Documents.Field fldContent = new Lucene.Net.Documents.Field("content", content, Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.TOKENIZED, Lucene.Net.Documents.Field.TermVector.WITH_POSITIONS_OFFSETS); Can anyone help me how to find out the keyword position and extract nearest 15 words. please send some code. Thanks Ashish

Read the article
how do i filter my lucene search results?

- by Andrew Bullock

Say my requirement is "search for all users by name, who are over 18" If i were using SQL, i might write something like: Select * from [Users] Where ([firstname] like '%' + @searchTerm + '%' OR [lastname] like '%' + @searchTerm + '%') AND [age] >= 18 However, im having difficulty translating this into lucene.net. This is what i have so far: var parser = new MultiFieldQueryParser({ "firstname", "lastname"}, new StandardAnalyser()); var luceneQuery = parser.Parse(searchterm) var query = FullTextSession.CreateFullTextQuery(luceneQuery, typeof(User)); var results = query.List<User>(); How do i add in the "where age = 18" bit? I've heard about .SetFilter(), but this only accepts LuceneQueries, and not IQueries. If SetFilter is the right thing to use, how do I make the appropriate filter? If not, what do I use and how do i do it? Thanks! P.S. This is a vastly simplified version of what I'm trying to do for clarity, my WHERE clause is actually a lot more complicated than shown here. In reality i need to check if ids exist in subqueries and check a number of unindexed properties. Any solutions given need to support this. Thanks

Read the article
"no inclosing instance error " while getting top term frequencies for document from Lucene index

- by Julia

Hello ! I am trying to get the most occurring term frequencies for every particular document in Lucene index. I am trying to set the treshold of top occuring terms that I care about, maybe 20 However, I am getting the "no inclosing instance of type DisplayTermVectors is accessible" when calling Comparator... So to this function I pass vector of every document and max top terms i would like to know protected static Collection getTopTerms(TermFreqVector tfv, int maxTerms){ String[] terms = tfv.getTerms(); int[] tFreqs = tfv.getTermFrequencies(); List result = new ArrayList(terms.length); for (int i = 0; i < tFreqs.length; i++) { TermFrq tf = new TermFrq(terms[i], tFreqs[i]); result.add(tf); } Collections.sort(result, new FreqComparator()); if(maxTerms < result.size()){ result = result.subList(0, maxTerms); } return result; } /Class for objects to hold the term/freq pairs/ static class TermFrq{ private String term; private int freq; public TermFrq(String term,int freq){ this.term = term; this.freq = freq; } public String getTerm(){ return this.term; } public int getFreq(){ return this.freq; } } /*Comparator to compare the objects by the frequency*/ class FreqComparator implements Comparator{ public int compare(Object pair1, Object pair2){ int f1 = ((TermFrq)pair1).getFreq(); int f2 = ((TermFrq)pair2).getFreq(); if(f1 > f2) return 1; else if(f1 < f2) return -1; else return 0; } } Explanations and corrections i will very much appreciate, and also if someone else had experience with term frequency extraction and did it better way, I am opened to all suggestions! Please help!!!! Thanx!

Read the article
Lucene Search Returning Extra, Undesired Records

- by Brandon

I have a Lucene index that contains a field called 'Name'. I escape all special characters before inserting a value into my index using QueryParser.Escape(value). In my example I have 2 documents with the following names respectively: Test Test (Test) They get inserted into my index as such (I can confirm this using Luke): [test] [test] [\(test\)] I insert these values as TOKENIZED and using the StandardAnalyzer. When I perform a search, I use the QueryParser.Escape(searchString) against my search string input to escape special characters and then use the QueryParser with my 'Name' field and the StandardAnalyzer to perform my search. When I perform a search for 'Test', I get back both documents in my index (as expected). However, when I perform a search for 'Test (Test)', I am getting back both documents still. I realize that in both examples it matches on the 'test' term in the index, but I am confused in my 2nd example why it would not just pull back the document with the value of 'Test (Test)' because my search should create two terms: [test] and [\(test\)] I would imagine it would perform some sort of boolean operator where BOTH terms must match in that situation so I would get back just one record. Is there something I am missing or a trick to make the search behave as desired?

Read the article
Android Couchdb - libcouch and IPC Aidl Services

- by dirtySanchez

I am working on a native CouchdDB app with android. Now just this week CouchOne released libcouch, described as "Library files needed to interact with CouchDB on Android": couchone_libcouch@Github It is a basic app that installs CouchDB if the CouchDB service (that comes with CouchDB if it was installed previously) can't be bound to. To be more precise, as I understand it: libcouch estimates CouchDb's presence on the device by trying to bind to a IPC Service from CouchDB and through that service wants communicate with CouchDB. Please see the method "attemptLaunch()" at CouchAppLauncher.class for reviewing this: public void attemptLaunch() { Log.i(TAG,"1.) called attemptLaunch"); Intent intent = new Intent(ICouchService.class.getName()); Log.i(TAG,"1.a) setup Intent"); Boolean canStart = bindService(intent, couchServiceConn, Context.BIND_AUTO_CREATE); Log.i(TAG,"1.b bound service. canStart: " + Boolean.toString(canStart)); if (!canStart) { setContentView(R.layout.install_couchdb); TextView label = (TextView) findViewById(R.id.install_couchdb_text); Button btn = (Button) this.findViewById(R.id.install_couchdb_btn); String text = getString(R.string.app_name) + " requires Apache CouchDB to be installed."; label.setText(text); // Launching the market will fail on emulators btn.setOnClickListener(new View.OnClickListener() { public void onClick(View v) { launchMarket(); finish(); } }); } } The question(s) I have about this are: libcouch never is able to "find" a previously installed CouchDB. It always attempts to install CouchDB from the market. This is because it never actually is able to bind to the CouchDBService. As I understand the purpose auf AIDL generated service interfaces, the actual service that intends to offer it's IPC to other applications should make use of AIDL. In this case the AIDL has been moved to the application that is trying to bind to the remote service, which is libcouch in this case. Reviewing the commits the AIDL files have just been moved out of that repository to libcouch. For complete linkage, here's the link to the Android CouchDB sources: github.com/couchone/libcouch-android Now, I could be completely wrong in my findings, it could also be lincouch's Manifest that s missing something, but I am really looking forward to get some answers!

Read the article
How do I get Lucene (.NET) to highlight correctly with wildcards?

- by Scott Stafford

I am using the Lucene.NET API directly in my ASP.NET/C# web application. When I search using a wildcard, like "fuc*", the highlighter doesn't highlight anything, but when I search for the whole word, like "fuchsia", it highlights fine. Does Lucene have the ability to highlight using the same logic it used to match with? Various maybe-relevant code-snippets below: var formatter = new Lucene.Net.Highlight.SimpleHTMLFormatter( "<span class='srhilite'>", "</span>"); var fragmenter = new Lucene.Net.Highlight.SimpleFragmenter(100); var scorer = new Lucene.Net.Highlight.QueryScorer(query); var highlighter = new Lucene.Net.Highlight.Highlighter(formatter, scorer); highlighter.SetTextFragmenter(fragmenter); and then on each hit... string description = Server.HtmlEncode(doc.Get("Description")); var stream = analyzer.TokenStream("Description", new System.IO.StringReader(description)); string highlighted_text = highlighter.GetBestFragments( stream, description, 1, "..."); And I'm using the QueryParser and the StandardAnalyzer.

Read the article
How do I detect if there is already a similar document stored in Lucene index.

- by Jenea

Hi. I need to exclude duplicates in my database. The problem is that duplicates are not considered exact match but rather similar documents. For this purpose I decided to use FuzzyQuery like follows: var fuzzyQuery = new global::Lucene.Net.Search.FuzzyQuery( new Term("text", queryText), 0.8f, 0); hits = _searcher.Search(query); The idea was to set the minimal similarity to 0.8 (that I think is high enough) so only similar documents will be found excluding those that are not sufficiently similar. To test this code I decided to see if it finds already existing document. To the variable queryText was assigned a value that is stored in the index. The code from above found nothing, in other words it doesn't detect even exact match. Index was build by this code: doc.Add(new global::Lucene.Net.Documents.Field( "text", text, global::Lucene.Net.Documents.Field.Store.YES, global::Lucene.Net.Documents.Field.Index.TOKENIZED, global::Lucene.Net.Documents.Field.TermVector.WITH_POSITIONS_OFFSETS)); I followed recomendations from bellow and the results are: TermQuery doesn't return any result. Query contructed with var _analyzer = new RussianAnalyzer(); var parser = new global::Lucene.Net.QueryParsers .QueryParser("text", _analyzer); var query = parser.Parse(queryText); var _searcher = new IndexSearcher (Settings.General.Default.LuceneIndexDirectoryPath); var hits = _searcher.Search(query); Returns several results with the maximum score the document that has exact match and other several documents that have similar content.

Read the article
Lucene: Question of score caculation with PrefixQuery

- by Keven

Hi, I meet some problem with the score caculation with a PrefixQuery. To change score of each document, when add document into index, I have used setBoost to change the boost of the document. Then I create PrefixQuery to search, but the result have not been changed according to the boost. It seems setBoost totally doesn't work for a PrefixQuery. Please check my code below: @Test public void testNormsDocBoost() throws Exception { Directory dir = new RAMDirectory(); IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_CURRENT), true, IndexWriter.MaxFieldLength.LIMITED); Document doc1 = new Document(); Field f1 = new Field("contents", "common1", Field.Store.YES, Field.Index.ANALYZED); doc1.add(f1); doc1.setBoost(100); writer.addDocument(doc1); Document doc2 = new Document(); Field f2 = new Field("contents", "common2", Field.Store.YES, Field.Index.ANALYZED); doc2.add(f2); doc2.setBoost(200); writer.addDocument(doc2); Document doc3 = new Document(); Field f3 = new Field("contents", "common3", Field.Store.YES, Field.Index.ANALYZED); doc3.add(f3); doc3.setBoost(300); writer.addDocument(doc3); writer.close(); IndexReader reader = IndexReader.open(dir); IndexSearcher searcher = new IndexSearcher(reader); TopDocs docs = searcher.search(new PrefixQuery(new Term("contents", "common")), 10); for (ScoreDoc doc : docs.scoreDocs) { System.out.println("docid : " + doc.doc + " score : " + doc.score + " " + searcher.doc(doc.doc).get("contents")); } } The output is : docid : 0 score : 1.0 common1 docid : 1 score : 1.0 common2 docid : 2 score : 1.0 common3

Read the article
Multiple or single index in Lucene?

- by Bruno Reis

I have to index different kinds of data (text documents, forum messages, user profile data, etc) that should be searched together (ie, a single search would return results of the different kinds of data). What are the advantages and disadvantages of having multiple indexes, one for each type of data? And the advantages and disadvantages of having a single index for all kinds of data? Thank you.

Read the article
How to call a Zend lucene search function?

- by stef

I inherited a Zend project devoid of comments and I didn't get to talk to the previous developer. Since I have no Zend experience I'm having some issues :) I'd like to print out some variables inside an function that indexes items from the site using Zend_Search_Lucene because I think something is going wrong here. From what I've read, ::create creates a new index and ::open updates it. So it's in this ::open function I'd like to print out some variables. The name and params of the function are below. Does anyone have any idea how this function can be called so I can run some tests? private function search($category,$string,$page = 1,$itemsByPage = 5) EDIT: OR, is there a way I can nuke the existing index and force it to be rebuilt completely, for example by deleting the index files on the FS and then performing some searches?

Read the article
How can I proxy couchDB as a sub-diectory with lighttpd?

- by indieinvader

I have a sub-directory on my web server (lighttpd) that I want to point at a CouchDB instance running on the same machine. I tried using mod_proxy but it sends along the whole request, like a proxy should, I know! So: // What happens: Lighttpd: http://localhost/couchdb/some_request | V CouchDB: http://localhost:5984/couchdb/some_request // What I want to happen: Lighttpd: http://localhost/couchdb/some_request | V CouchDB: http://localhost:5984/some_request Is there any way to make this setup work?

Read the article
How do i keep my DB and lucene in sync?

- by acidzombie24

So i can have a transaction in sql. But i am sure its not a good idea to wait in the middle of a transaction for lucene to finish also i am unsure if lucene is permanently saved in the DB until i do something there. Whats the best way to keep my DB and lucene in sync? I am thinking of adding a lucene_queue in my sql db and everytime i make a change i add it into the queue (removing older queue if any) and delete it once it is done. Is this the best way? Also i am unsure how to make lucene permanently keep the changes i made and how frequent i can/should do it.

Read the article
Using CouchDB to serve HTML.

- by alxross

I'm trying to use CouchDB with HTML/standalone REST architecture. That is, no other app server other than CouchDB and ajax style javascript calling CouchDB. It looks like cross scripting is a problem. I was using Cloudkit/Tokyo Cabinet before and it seems like the needed callback function was screwing it up in the URL. Now I'm trying CouchDB and getting the same problem. Here are my questions: 1) Are these problems because the REST/JSON store like CouchDB or CloudKit is running on a different port from my web page? They're both run locally and called from "localhost". 2) Should I let CouchDB host my page and serve the HTML? 3) How do I do this? The documentation didnt seem so clear... Thanks, Alex

Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >