lucene index - Page 7 - Developer IT

Resources for getting started with Lucene.Net?

- by Matt Dotson

I'm building a simple site that allows users to post text content and I want to add it to a search index as it gets posted, so my site search is up to date. From what I can tell Lucene.NET is a good full text search framework. I've found very few examples of how to use it though. Can anyone post some good references for learning about Lucene?

Read the article

NHibernate.Search with Lucene.NET without using DB?

- by Samnang

Could I use NHibernate.Search only with lucene’s index without database? Because I would like to store all data only in my lucene’s index, but I really like features in NHibernate.Search.

Read the article

Where can I find open source applications that use lucene.net

- by Luke101

I am looking for any open source application that uses lucene.net. I am working on a complicated web application and would like to see how others have implemented lucene.net.

Read the article

Apache Lucene or another Search in iPhone app

- by lostInTransit

Hi I would like to implement a search functionality within my iPhone app which can search for terms within all the documents in the application. I believe I cannot use Apache Lucene directly since it is in Java. Can I use Lucy which is a C port of Lucene (not sure if Perl and Ruby would work on it)? Or is there any other open-source search engine which I can use in my iPhone app for search within the app? Thanks

Read the article

What is the VInt in Lucene ?

- by Mehdi Amrollahi

I want to know what is the VInt in Lucene ? I read this article , but i don't understand what is it and where does Lucene use it ? Thanks .

Read the article

What is the segment in Lucene ?

- by Mehdi Amrollahi

I don't understand , what are the segments in Lucene ? What are the benefits of Lucene's segments? Thanks .

Read the article

Lucene: How to have more than 100 results?

- by zshis

Hi all, my question is simple but i cant fin de answer. Is there a way to set in Lucene to retrieve an amount of results higher than 100 in a query? Im using lucene 2.4.0 now. Thanks all.

Read the article

Solr/Lucene user click based ranking

- by Danim

I am facing the problem of sort Lucene results based on user click log. I would like that more accessed results comes first. Does anyone knows how to configure or implement such property in Lucene or Solr? Thank you very much.

Read the article

Indexing different type of Entities/Objects with Solr Lucene

- by Yos

Let's say I want to index my shop using Solr Lucene. I have many types of entities : Products, Product Reviews, Articles How do I get my Lucene to index those types, but each type with different Schema ?

Read the article

How get the offset of term in Lucene ?

- by Mehdi Amrollahi

I want to get the offset of one term in the Lucene . How can i get it ? I vectored my content as Field.TermVector.WITH_POSITIONS_OFFSETS Is there any method in Lucene that give me offset of the term in one Document ?

Read the article

Lucene setboost doesn't work

- by Keven

Hi all, OUr team just upgrade lucene from 2.3 to 3.0 and we are confused about the setboost and getboost of document. What we want is just set a boost for each document when add them into index, then when search it the documents in the response should have different order according to the boost I set. But it seems the order is not changed at all, even the boost of each document in the search response is still 1.0. Could some one give me some hit? Following is our code: String[] a = new String[] { "schindler", "spielberg", "shawshank", "solace", "sorcerer", "stone", "soap", "salesman", "save" }; List strings = Arrays.asList(a); AutoCompleteIndex index = new Index(); IndexWriter writer = new IndexWriter(index.getDirectory(), AnalyzerFactory.createAnalyzer("en_US"), true, MaxFieldLength.LIMITED); float i = 1f; for (String string : strings) { Document doc = new Document(); Field f = new Field(AutoCompleteIndexFactory.QUERYTEXTFIELD, string, Field.Store.YES, Field.Index.NOT_ANALYZED); doc.setBoost(i); doc.add(f); writer.addDocument(doc); i += 2f; } writer.close(); IndexReader reader2 = IndexReader.open(index.getDirectory()); for (int j = 0; j < reader2.maxDoc(); j++) { if (reader2.isDeleted(j)) { continue; } Document doc = reader2.document(j); Field f = doc.getField(AutoCompleteIndexFactory.QUERYTEXTFIELD); System.out.println(f.stringValue() + ":" + f.getBoost() + ", docBoost:" + doc.getBoost()); doc.setBoost(j); }

Read the article

How to structure an index for type ahead for extremely large dataset using Lucene or similar?

- by Pete

I have a dataset of 200million+ records and am looking to build a dedicated backend to power a type ahead solution. Lucene is of interest given its popularity and license type, but I'm open to other open source suggestions as well. I am looking for advice, tales from the trenches, or even better direct instruction on what I will need as far as amount of hardware and structure of software. Requirements: Must have: The ability to do starts with substring matching (I type in 'st' and it should match 'Stephen') The ability to return results very quickly, I'd say 500ms is an upper bound. Nice to have: The ability to feed relevance information into the indexing process, so that, for example, more popular terms would be returned ahead of others and not just alphabetical, aka Google style. In-word substring matching, so for example ('st' would match 'bestseller') Note: This index will purely be used for type ahead, and does not need to serve standard search queries. I am not worried about getting advice on how to set up the front end or AJAX, as long as the index can be queried as a service or directly via Java code. Up votes for any useful information that allows me to get closer to an enterprise level type ahead solution

Read the article

How do i implement tag searching with lucene?

- by acidzombie24

I havent used lucene. Last time i ask (many months ago, maybe a year) people suggested lucene. As am example say there are 3 items tag like this apples carrots apples carrots apple banana if a user search apples i dont care if there is any preference from 1,2 and 4. However i seen many forums do this which i hated is when a user search apple carrots 2 and 3 are get high results while 1 is hard to find even though it matches my search more closely. I HATED this in forums. Also i would like the ability to do search carrots -apples which will only get me 3. I am not sure what should happen if i search carrots banana but anyways as long as more 2 and 3 results are lower priority then 1 when i search apples carrots i'll be happy. Can lucene do this? and where do i start? i see a lot of classes and many of them talk about docs. What should i use for tagging?

Read the article

Read huge free text docs in one file for lucene indexing

- by Jun

I have heaps of free text news docs in one big file. The structure of each news doc is like: (Header line) Category, Doc1, Date (day, month, year) (body text) ... ... ... (Header line) Category, Doc2, Date (day, month, year) (body text) ... ... ... If I extract each doc from the big file, it costs too much time and not efficient. Therefore, I decide to read the file line by line and feed information to lucene the same time. I write c# code to index each doc to lucene like: Streamreader sr = new Streamreader(file); string line = ""; while((line = sr.ReadLine()) != null) { How can I tell this line is a doc header line from text line and get the metadata and all the text lines of a doc for lucene to index. Also, the text is read by OCR which can not give correct line-separating. Captions are mixed with content text iterate the process till the end of the file } with thanks

Read the article

Luke like tool in C#

- by Kapil

Is there are Luke like tool for viewing lucene indexes in C# using the Lucene.NET api?

Read the article

Building a case for solr

- by Midhat

Our product consists of multiple applications, All using Lucene. 2 of the applications I am involved with have Lucene indexes of about 3 GB and 12GB. Another team is building an application, for which they estimate the LUCENE INDEX size to be close to 1 Terabyte. New documents are added to the indexes every 15 days approx. We do not have any apparent performance issues with the current applications. So my question is SHould we be using Solr now? When should one stop using Lucene and graduate to Solr? Any disadvantages/problems for using Solr? The client applications are made in ASP.Net, but I assume they will be able to use a solr server using solrnet

Read the article

Custom SNMP Cacti Data Source fails to update

- by Andrew Wilkinson

I'm trying to create a custom SNMP datasource for Cacti but despite everything I can check being correct, it is not creating the rrd file, or updating it even when I create it. Other, standard SNMP sources are working correctly so it's not SNMP or permissions that are the problem. I've created a new Data Query, which when I click on "Verbose Query" on the device screen returns the following: + Running data query [10]. + Found type = '3' [SNMP Query]. + Found data query XML file at '/volume1/web/cacti/resource/snmp_queries/syno_volume_stats.xml' + XML file parsed ok. + missing in XML file, 'Index Count Changed' emulated by counting oid_index entries + Executing SNMP walk for list of indexes @ '.1.3.6.1.2.1.25.2.3.1.3' Index Count: 8 + Index found at OID: '.1.3.6.1.2.1.25.2.3.1.3.1' value: 'Physical memory' + Index found at OID: '.1.3.6.1.2.1.25.2.3.1.3.3' value: 'Virtual memory' + Index found at OID: '.1.3.6.1.2.1.25.2.3.1.3.6' value: 'Memory buffers' + Index found at OID: '.1.3.6.1.2.1.25.2.3.1.3.7' value: 'Cached memory' + Index found at OID: '.1.3.6.1.2.1.25.2.3.1.3.10' value: 'Swap space' + Index found at OID: '.1.3.6.1.2.1.25.2.3.1.3.31' value: '/' + Index found at OID: '.1.3.6.1.2.1.25.2.3.1.3.32' value: '/volume1' + Index found at OID: '.1.3.6.1.2.1.25.2.3.1.3.33' value: '/opt' + index_parse at OID: '.1.3.6.1.2.1.25.2.3.1.3.1' results: '1' + index_parse at OID: '.1.3.6.1.2.1.25.2.3.1.3.3' results: '3' + index_parse at OID: '.1.3.6.1.2.1.25.2.3.1.3.6' results: '6' + index_parse at OID: '.1.3.6.1.2.1.25.2.3.1.3.7' results: '7' + index_parse at OID: '.1.3.6.1.2.1.25.2.3.1.3.10' results: '10' + index_parse at OID: '.1.3.6.1.2.1.25.2.3.1.3.31' results: '31' + index_parse at OID: '.1.3.6.1.2.1.25.2.3.1.3.32' results: '32' + index_parse at OID: '.1.3.6.1.2.1.25.2.3.1.3.33' results: '33' + Located input field 'index' [walk] + Executing SNMP walk for data @ '.1.3.6.1.2.1.25.2.3.1.3' + Found item [index='Physical memory'] index: 1 [from value] + Found item [index='Virtual memory'] index: 3 [from value] + Found item [index='Memory buffers'] index: 6 [from value] + Found item [index='Cached memory'] index: 7 [from value] + Found item [index='Swap space'] index: 10 [from value] + Found item [index='/'] index: 31 [from value] + Found item [index='/volume1'] index: 32 [from value] + Found item [index='/opt'] index: 33 [from value] + Located input field 'volsizeunit' [walk] + Executing SNMP walk for data @ '.1.3.6.1.2.1.25.2.3.1.4' + Found item [volsizeunit='1024 Bytes'] index: 1 [from value] + Found item [volsizeunit='1024 Bytes'] index: 3 [from value] + Found item [volsizeunit='1024 Bytes'] index: 6 [from value] + Found item [volsizeunit='1024 Bytes'] index: 7 [from value] + Found item [volsizeunit='1024 Bytes'] index: 10 [from value] + Found item [volsizeunit='4096 Bytes'] index: 31 [from value] + Found item [volsizeunit='4096 Bytes'] index: 32 [from value] + Found item [volsizeunit='4096 Bytes'] index: 33 [from value] + Located input field 'volsize' [walk] + Executing SNMP walk for data @ '.1.3.6.1.2.1.25.2.3.1.5' + Found item [volsize='1034712'] index: 1 [from value] + Found item [volsize='3131792'] index: 3 [from value] + Found item [volsize='1034712'] index: 6 [from value] + Found item [volsize='775904'] index: 7 [from value] + Found item [volsize='2097080'] index: 10 [from value] + Found item [volsize='612766'] index: 31 [from value] + Found item [volsize='1439812394'] index: 32 [from value] + Found item [volsize='1439812394'] index: 33 [from value] + Located input field 'volused' [walk] + Executing SNMP walk for data @ '.1.3.6.1.2.1.25.2.3.1.6' + Found item [volused='1022520'] index: 1 [from value] + Found item [volused='1024096'] index: 3 [from value] + Found item [volused='32408'] index: 6 [from value] + Found item [volused='775904'] index: 7 [from value] + Found item [volused='1576'] index: 10 [from value] + Found item [volused='148070'] index: 31 [from value] + Found item [volused='682377865'] index: 32 [from value] + Found item [volused='682377865'] index: 33 [from value] AS you can see it appears to be returning the correct data. I've also set up data templates and graph templates to display the data. The create graphs for a device screen shows the correct data, and when selecting one row can clicking create a new data source and graph are created. Unfortunately the data source is never updated. Increasing the poller log level shows that it appears to not even be querying the data source, despite it being used? What should my next steps to debug this issue be?

Read the article

Lucene multiple indexes : Normalize document scores??

- by Roey

Hi All. Suppose I've got multiple lucene indexes (not replicas) on several PC's. I query each index and then merge the results. Is there any way to normalize the document scores so that I could sort by score (relevance)? I mean, the scores for document A from index A would not be comparable with document B from index B, unless I do some sort of normalization.... not so? Thanks Roey

Read the article

Lucene Error While Reading binary block : java.io.EOFException

- by tushar Khairnar

Hi, I am getting java.io.EOFException while reading a binary block from lucene index. I am storing java object as byte-array in lucene index field and reading it when hit occurs. Here is stack trace : Caused by: java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2281) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2750) at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:780) at java.io.ObjectInputStream.(ObjectInputStream.java:280) at org.terracotta.modules.searchable.util.SerializationUtil$OIS.(SerializationUtil.java:20) I have some background threads which write into index. But i buffer them and then write them at once like 1000. Occasionally I also issue optimize() on index. When I write, I am re-opening IndexReader. Does this is happening because of IndexReader re-opening call? Thanks. Regards Tushar

Read the article

Lucene.NET search index approach

- by Tim Peel

Hi, I am trying to put together a test case for using Lucene.NET on one of our websites. I'd like to do the following: Index in a single unique id. Index across a comma delimitered string of terms or tags. For example. Item 1: Id = 1 Tags = Something,Separated-Term I will then be structuring the search so I can look for documents against tag i.e. tags:something OR tags:separate-term I need to maintain the exact term value in order to search against it. I have something running, and the search query is being parsed as expected, but I am not seeing any results. Here's some code. My parser (_luceneAnalyzer is passed into my indexing service): var parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_CURRENT, "Tags", _luceneAnalyzer); parser.SetDefaultOperator(QueryParser.Operator.AND); return parser; My Lucene.NET document creation: var doc = new Document(); var id = new Field( "Id", NumericUtils.IntToPrefixCoded(indexObject.id), Field.Store.YES, Field.Index.NOT_ANALYZED, Field.TermVector.NO); var tags = new Field( "Tags", string.Join(",", indexObject.Tags.ToArray()), Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.YES); doc.Add(id); doc.Add(tags); return doc; My search: var parser = BuildQueryParser(); var query = parser.Parse(searchQuery); var searcher = Searcher; TopDocs hits = searcher.Search(query, null, max); IList<SearchResult> result = new List<SearchResult>(); float scoreNorm = 1.0f / hits.GetMaxScore(); for (int i = 0; i < hits.scoreDocs.Length; i++) { float score = hits.scoreDocs[i].score * scoreNorm; result.Add(CreateSearchResult(searcher.Doc(hits.scoreDocs[i].doc), score)); } return result; I have two documents in my index, one with the tag "Something" and one with the tags "Something" and "Separated-Term". It's important for the - to remain in the terms as I want an exact match on the full value. When I search with "tags:Something" I do not get any results. Question What Analyzer should I be using to achieve the search index I am after? Are there any pointers for putting together a search such as this? Why is my current search not returning any results? Many thanks

Read the article

Unable to create index because of duplicate that doesn't exist?

- by Alex Angas

I'm getting an error running the following Transact-SQL command: CREATE UNIQUE NONCLUSTERED INDEX IX_TopicShortName ON DimMeasureTopic(TopicShortName) The error is: Msg 1505, Level 16, State 1, Line 1 The CREATE UNIQUE INDEX statement terminated because a duplicate key was found for the object name 'dbo.DimMeasureTopic' and the index name 'IX_TopicShortName'. The duplicate key value is (). When I run SELECT * FROM sys.indexes WHERE name = 'IX_TopicShortName' or SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[DimMeasureTopic]') the IX_TopicShortName index does not display. So there doesn't appear to be a duplicate. I have the same schema in another database and can create the index without issues there. Any ideas why it won't create here?

Read the article

NGINX Remove index.php /index.php/something/more/ to /something/more

- by Gaston

I'm trying to clean urls in NGINX using framework DooPHP. This = - http://example.com/index.php/something/more/ To This = - http://example.com/something/more/ I want to remove (clean url) the "index.php" from the url if someone try to enter in the first form. Like a permanent redirect. How to do this config on NGINX? Thanks. [Update: Actual nginx config] server { listen 80; server_name vip.example.com; rewrite ^/(.*) https://vip.example.com/$1 permanent; } server { listen 443; server_name vip.example.com; error_page 404 /vip.example.com/404.html; error_page 403 /vip.example.com/403.html; error_page 401 /vip.example.com/401.html; location /vip.example.com { root /sites/errors; } ssl on; ssl_certificate /etc/nginx/config/server.csr; ssl_certificate_key /etc/nginx/config/server.sky; if (!-e $request_filename){ rewrite /.* /index.php; } location / { auth_basic "example Team Access"; auth_basic_user_file config/htpasswd; root /sites/vip.example.com; index index.php; } location ~ \.php$ { fastcgi_pass 127.0.0.1:9000; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME /sites/vip.example.com$fastcgi_script_name; include fastcgi_params; fastcgi_param PATH_INFO $fastcgi_script_name; } }

Read the article

Reading 'Index Status' graph in Google Webmaster tools

- by sam

I recently found a bunch of old files that had been ftp'ed to a live production server by mistake on a static (html / css / js) site. I manually deleted these files, but today when checking in Google Webmaster tools i found this graph below. The 'update' marker is from 3/9/14, what i can work out is what Google is trying to tell me, are they saying that : There was a ranking update like Penguin or Panda and they penalized my site and un-indexed a load of pages which they thought were junk.. OR Is this showing that I updated the site by deleting the files on the server on 3/9/14 OR Is this something else ?

Read the article

Tokenizing Twitter Posts in Lucene

- by Amaç Herdagdelen

Hello, My question in a nutshell: Does anyone know of a TwitterAnalyzer or TwitterTokenizer for Lucene? More detailed version: I want to index a number of tweets in Lucene and keep the terms like @user or #hashtag intact. StandardTokenizer does not work because it discards the punctuation (but it does other useful stuff like keeping domain names, email addresses or recognizing acronyms). How can I have an analyzer which does everything StandardTokenizer does but does not touch terms like @user and #hashtag? My current solution is to preprocess the tweet text before feeding it into the analyzer and replace the characters by other alphanumeric strings. For example, String newText = newText.replaceAll("#", "hashtag"); newText = newText.replaceAll("@", "addresstag"); Unfortunately this method breaks legitimate email addresses but I can live with that. Does that approach make sense? Thanks in advance! Amaç

Read the article

Lucene numDocs and doqFreq on custom similarity class

- by David A

Hi All, im doing an aplication with Lucene (im a noob with it) and im facing some problems. My aplication uses the Lucene 2.4.0 library with a custom similaraty implementation (the jar is imported) In my app im calculating doqFreq and numDocs manually (im adding the values of all indexes and then i calculate a global value in order to use it on every query) and i want to use that values on a custom similarity implementation in order to calculate a new IDF. The problem is that I dont know how to use (or send) the new doqFreq and numDocs values from my app on that new similarty implementation as I dont want to change lucene´s code apart from this extra class. Any suggestions or examples? I read the docs but i dont now how to aproach this :s Thanks

Search Results

Search found 12803 results on 513 pages for 'lucene index'.

Page 7/513 | < Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >

- by Matt Dotson

- by Samnang

- by Luke101

- by lostInTransit

- by Mehdi Amrollahi

- by Mehdi Amrollahi

- by zshis

- by Danim

- by Yos

- by Mehdi Amrollahi

- by Keven

- by Pete

- by acidzombie24

- by Jun

- by Kapil

- by Midhat

- by Andrew Wilkinson

- by Roey

- by tushar Khairnar

- by Tim Peel

- by Alex Angas

- by Gaston

- by sam

- by Amaç Herdagdelen

- by David A

< Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >