Search Results

Search found 12803 results on 513 pages for 'lucene index'.

Page 8/513 | < Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15  | Next Page >

  • Lucene Search for japanese characters

    - by Pranali Desai
    Hi All, I have implemented lucene for my application and it works very well unless you have introduced something like japanese characters. The problem is that if I have japanese string ?????????????? and I search with ? that is the first character than it works well whereas if I use more than one japanese character(????)in search token search fails and there is no document found. Are japanese characters supported in lucene? what are the settings to be done to get it working?

    Read the article

  • Exception when indexing text documents with Lucene, using SnowballAnalyzer for cleaning up

    - by Julia
    Hello!!! I am indexing the documents with Lucene and am trying to apply the SnowballAnalyzer for punctuation and stopword removal from text .. I keep getting the following error :( IllegalAccessError: tried to access method org.apache.lucene.analysis.Tokenizer.(Ljava/io/Reader;)V from class org.apache.lucene.analysis.snowball.SnowballAnalyzer Here is the code, I would very much appreciate help!!!! I am new with this.. public class Indexer { private Indexer(){}; private String[] stopWords = {....}; private String indexName; private IndexWriter iWriter; private static String FILES_TO_INDEX = "/Users/ssi/forindexing"; public static void main(String[] args) throws Exception { Indexer m = new Indexer(); m.index("./newindex"); } public void index(String indexName) throws Exception { this.indexName = indexName; final File docDir = new File(FILES_TO_INDEX); if(!docDir.exists() || !docDir.canRead()){ System.err.println("Something wrong... " + docDir.getPath()); System.exit(1); } Date start = new Date(); PerFieldAnalyzerWrapper analyzers = new PerFieldAnalyzerWrapper(new SimpleAnalyzer()); analyzers.addAnalyzer("text", new SnowballAnalyzer("English", stopWords)); Directory directory = FSDirectory.open(new File(this.indexName)); IndexWriter.MaxFieldLength maxLength = IndexWriter.MaxFieldLength.UNLIMITED; iWriter = new IndexWriter(directory, analyzers, true, maxLength); System.out.println("Indexing to dir..........." + indexName); if(docDir.isDirectory()){ File[] files = docDir.listFiles(); if(files != null){ for (int i = 0; i < files.length; i++) { try { indexDocument(files[i]); }catch (FileNotFoundException fnfe){ fnfe.printStackTrace(); } } } } System.out.println("Optimizing...... "); iWriter.optimize(); iWriter.close(); Date end = new Date(); System.out.println("Time to index was" + (end.getTime()-start.getTime()) + "miliseconds"); } private void indexDocument(File someDoc) throws IOException { Document doc = new Document(); Field name = new Field("name", someDoc.getName(), Field.Store.YES, Field.Index.ANALYZED); Field text = new Field("text", new FileReader(someDoc), Field.TermVector.WITH_POSITIONS_OFFSETS); doc.add(name); doc.add(text); iWriter.addDocument(doc); } }

    Read the article

  • Search Lucene with precise edit distances

    - by askullhead
    I would like to search a Lucene index with edit distances. For example, say, there is a document with a field FIRST_NAME; I want all documents with first names that are 1 edit distance away from, say, 'john'. I know that Lucene supports fuzzy searches (FIRST_NAME:john~) and takes a number between 0 and 1 to control the fuzziness. The problem (for me) is this number does not directly translate to an edit distance. And when the values in the documents are short strings (less than 3 characters) the fuzzy search has difficulty finding them. For example if there is a document with FIRST_NAME 'J' and I search for FIRST_NAME:I~0.0 I don't get anything back.

    Read the article

  • lucene get matched terms in query

    - by iamrohitbanga
    what is the best way to find out which terms in a query matched against a given document returned as a hit in lucene? I have tried a weird method involving hit highlighting package in lucene contrib and also a method that searches for every word in the query against the top most document ("docId: xy AND description: each_word_in_query"). Do not get satisfactory results? hit highlighting does not report some of the words that matched for a document other than the first one. i am not sure if the second approach is the best alternative.

    Read the article

  • Can I store and join based on external attributes in Lucene/Solr

    - by Kibbee
    Is there a way to store information about documents that are stored in Lucene such that I don't have to update the entire document to update certain attributes about the documents? For instance, let's say I had a bunch of documents, and that I wanted to update a permissions list of who was allowed to see the documents on a daily, or more frequent, basis. Would it be possible to update all the permissions each day, without updating all the documents. I could do it by keeping a exactly which permissions were added and removed, but I would rather just be able to take the end list of permissions, and use that, rather than have to keep track of all the permission changes and post those entire documents to Lucene.

    Read the article

  • Using FieldSelector when searching with Lucene

    - by Christian
    I'm searching articles in PubMed via Lucene. Each of the 20,000,000 articles has an abstract with ~250 words and an ID. At the moment I store my searches, with each take multiple seconds, in a TopDocs object. Searchs can find thousands of articles. I'm just interested in the ID of the article. Does Lucene load the abstracts internally into the TopDocs? If so can I prevent that behavior through FieldSelectors or do FieldSelectors only work with IndexReader and don't work with IndexSearcher?

    Read the article

  • Lucene boost: I need to make it work better

    - by zvikico
    I'm using Lucene to index components with names and types. Some components are more important, thus, get a bigger boost. However, I cannot get my boost to work properly. I sill get some components appear later (get worse score), even though they have a higher boost. Note that the indexing is done on one field only and I've set the boost to that field alone. I'm using Lucene in Java. I don't think it has anything to do with the field length. I've seen components with the same name (but different type) get the wrong score.

    Read the article

  • Reverse search in Hibernate Search

    - by Javi
    Hello, I'm using Hibernate Search (which uses Lucene) for searching some Data I have indexed in a directory. It works fine but I need to do a reverse search. By reverse search I mean that I have a list of queries stored in my database I need to check which one of these queries match with a Data object each time Data Object is created. I need it to alert the user when a Data Object matches with a Query he has created. So I need to index this single Data Object which has just been created and see which queries of my list has this object as a result. I've seen Lucene MemoryIndex Class to create an index in memory so I can do something like this example for every query in a list (though iterating in a Java list of queries would not be very efficient): //Iterating over my list<Query> MemoryIndex index = new MemoryIndex(); //Add all fields index.addField("myField", "myFieldData", analyzer); ... QueryParser parser = new QueryParser("myField", analyzer); float score = index.search(query); if (score > 0.0f) { System.out.println("it's a match"); } else { System.out.println("no match found"); } The problem here is that this Data Class has several Hibernate Search Annotations @Field,@IndexedEmbedded,... which indicated how fields should be indexed, so when I invoke index() method on the FullTextEntityManager instance it uses this information to index the object in the directory. Is there a similar way to index it in memory using this information? Is there a more efficient way of doing this reverse search? Thanks

    Read the article

  • Why is my Lucene index getting locked?

    - by Andrew Bullock
    I had an issue with my search not return the results I expect. I tried to run Luke on my index, but it said it was locked and I needed to Force Unlock it (I'm not a Jedi/Sith though) I tried to delete the index folder and run my recreate-indicies application but the folder was locked. Using unlocker I've found that there are about 100 entries of w3wp.exe (same PID, different Handle) with a lock on the index. Whats going on? I'm doing this in my NHibernate configuration: c.SetListener(ListenerType.PostUpdate, new FullTextIndexEventListener()); c.SetListener(ListenerType.PostInsert, new FullTextIndexEventListener()); c.SetListener(ListenerType.PostDelete, new FullTextIndexEventListener()); And here is the only place i query the index: var fullTextSession = NHibernate.Search.Search.CreateFullTextSession(this.unitOfWork.Session); var fullTextQuery = fullTextSession.CreateFullTextQuery(query, typeof (Person)); fullTextQuery.SetMaxResults(100); return fullTextQuery.List<Person>(); Whats going on? What am i doing wrong? Thanks

    Read the article

  • Lucene document Boosting

    - by athreyar
    Hello, I am having problem with lucene boosting, Iam trying to boost a particular document which matches with the (firstname)field specified I have posted the part of the codeenter code hereprivate static Document createDoc(String lucDescription,String primaryk,String specialString){ Document doc = new Document(); doc.add(new Field("lucDescription",lucDescription, Field.Store.NO, Field.Index.TOKENIZED)); doc.add(new Field("primarykey",primaryk,Field.Store.YES,Field.Index.NO)); doc.add(new Field("specialDescription",specialString, Field.Store.NO, Field.Index.UN_TOKENIZED)); doc.setBoost ((float)(0.00001)); if (specialString.equals("chris")) doc.setBoost ((float)(100000.1)); return doc; } why is this not working?enter code herepublic static String dbSearch(String searchString){ List pkList = new ArrayList(); String conCat="("; try{ String querystr = searchString; Query query = new QueryParser("lucDescription", new StandardAnalyzer()).parse(querystr); IndexSearcher searchIndex = new IndexSearcher("/home/athreya/docsIndexFile"); // Index of the User table-- /home/araghu/aditya/indexFile. Hits hits = searchIndex.search(query); System.out.println("Found " + hits.length() + " hits."); for(int iterator=0;iterator Thank you in advance Athreya

    Read the article

  • Inspecting Lucene.NET index with Luke want to replicate NHibernate.Search view

    - by Tim Peel
    Hi, I am trying to put together an index using terms, which I specify as a comma separated list. I want to replicate the display in Luke as seen here: http://ayende.com/Blog/archive/2009/05/03/nhibernate-search-again.aspx But my index value just shows as a single field with the comma separate list value. For example: Tags term,anotherterm When I search my index, it will return results if I search with "term" but will not return anything if I search with "anotherterm" I thought the indexing process would break the comma separate list apart into separate values but this does not seem to be the case. Anyone got any ideas? Thanks

    Read the article

  • Lucene neo4j sort with boolean fields

    - by Daniele
    I have indexed some documents (nodes of neo4j) with a boolean property which not always is present. Eg. Node1 label : "label A" Node2: label : "label A" (note, same label of node1) special : true The goal is to get Node2 higher than node 1 for query "label A". Here the code: Index<Node> fulltextLucene = graphDb.index().forNodes( "my-index" ); Sort sort = new Sort(new SortField[] {SortField.FIELD_SCORE, new SortField("special", SortField.????, true) }); IndexHits<Node> results = fulltextLucene.query( "label", new QueryContext( "label A").sort(sort)); How can I accomplish that? Thanks

    Read the article

  • Running Long Process: Indexing 5GB docs with Lucene

    - by Robert Dondo
    Situation:I have an ASP .NET application that will search through docs using Lucene. I want to run the initial indexing (the index will be incremental after the initial run so there wont be need to index the whole directory again in future). Currently, I have about 5GB of docs (45000files). Problem: My application times out before completing the process. I have altered the TimeOut like this: HttpContext.Current.Server.ScriptTimeout = 200000; but it still does not complete the process. How can I run the index?

    Read the article

  • Apache not finding index.php by default, set rule for routing through index.php

    - by eoinoc
    Apache on the server is set to find index.php by default, and that works for a normal folder. However, I have a .htaccess rule to route all requests through my routing script: RewriteEngine On RewriteCond %{REQUEST_FILENAME} !-f RewriteRule ^(.*)$ index.php [QSA,L] With these .htaccess contents, the server returns a 404 error. Only by specifying /index.php does the routing script get called. Any tips on what I am doing wrong?

    Read the article

  • Lucene complex structure search

    - by archer
    Basically I do have pretty simple database that I'd like to index with Lucene. Domains are: // Person domain class Person { Set<Pair> keys; } // Pair domain class Pair { KeyItem keyItem; String value; } // KeyItem domain, name is unique field within the DB (!!) class KeyItem{ String name; } I've tens of millions of profiles and hundreds of millions of Pairs, however, since most of KeyItem's "name" fields duplicates, there are only few dozens KeyItem instances. Came up to that structure to save on KeyItem instances. Basically any Profile with any fields could be saved into that structure. Lets say we've profile with properties - name: Andrew Morton - eduction: University of New South Wales, - country: Australia, - occupation: Linux programmer. To store it, we'll have single Profile instance, 4 KeyItem instances: name, education,country and occupation, and 4 Pair instances with values: "Andrew Morton", "University of New South Wales", "Australia" and "Linux Programmer". All other profile will reference (all or some) same instances of KeyItem: name, education, country and occupation. My question is, how to index all of that so I can search for Profile for some particular values of KeyItem::name and Pair::value. Ideally I'd like that kind of query to work: name:Andrew* AND occupation:Linux* Should I create custom Indexer and Searcher? Or I could use standard ones and just map KeyItem and Pair as Lucene components somehow?

    Read the article

  • Indexing and Searching Over Word Level Annotation Layers in Lucene

    - by dmcer
    I have a data set with multiple layers of annotation over the underlying text, such as part-of-tags, chunks from a shallow parser, name entities, and others from various natural language processing (NLP) tools. For a sentence like The man went to the store, the annotations might look like: Word POS Chunk NER ==== === ===== ======== The DT NP Person man NN NP Person went VBD VP - to TO PP - the DT NP Location store NN NP Location I'd like to index a bunch of documents with annotations like these using Lucene and then perform searches across the different layers. An example of a simple query would be to retrieve all documents where Washington is tagged as a person. While I'm not absolutely committed to the notation, syntactically end-users might enter the query as follows: Query: Word=Washington,NER=Person I'd also like to do more complex queries involving the sequential order of annotations across different layers, e.g. find all the documents where there's a word tagged person followed by the words arrived at followed by a word tagged location. Such a query might look like: Query: "NER=Person Word=arrived Word=at NER=Location" What's a good way to go about approaching this with Lucene? Is there anyway to index and search over document fields that contain structured tokens?

    Read the article

  • Lucene raise document score if sibling entity matches query

    - by Pitagoras
    I have the following design situation. I use hibernate search (lucene in the back). Tha application manages ITEMs which have title, description and tags. These are full text indexed. On the other hand, we have COLLECTION of ITEMs. The user can create a COLLECTION and add as many ITEMs as she wants. ITEMs can also belong to many COLLECTIONs. I have a boosted query so that search terms that appear in the tags are more important than in the title, and lastly in the description. But I need an additional matching criteria: for a given ITEM, it whould rank better if other documents in some COLLECTION where the ITEM belongs, also match the query. This is like to say: the title/tags/description of "fellow" items (i.e. items in some shared collection) make the item rank better. I was thinking that adding an ITEM to a COLLECTION would add something like "extra tags" to every other ITEM in the collection, being these extra tags the elements to match in the added ITEM. I feel a more clever solution lucene-wise should exists. Any ideas/pointers are welcome. Thanks.

    Read the article

  • Distributed Lucene.NET

    - by user72185
    Hi, I have a Terabyte of data, maybe more, which I'd like to index and search with Lucene. I'd like to be able to split the index out to different machines, similar to what Solr does (if I understand Solr correctly). Are there any existing tools to do this on the Windows platform? Thanks!

    Read the article

  • Multiple word Auttosuggest using Lucene.Net

    - by eric
    I am currently working on an search application which uses Lucene.Net to index the data from the database to Index file. I have a product catalog which has Name, short and long description, sku and other fields. The data is stored in Index using StandardAnalyzer. I am trying to add auto suggestion for a text field and using TermEnum to get all the keyword terms and its score from the Index. But the terms returned are of single term. For example, if I type for co, the suggestion returned are costume, count, collection, cowboy, combination etc. But I want the suggestion to return phrases. For exmaple, if I search for co, the suggestions should be cowboy costume, costume for adults, combination locks etc. The following is the code used to get the suggestions: public string[] GetKeywords(string strSearchExp) { IndexReader rd = IndexReader.Open(mIndexLoc); TermEnum tenum = rd.Terms(new Term("Name", strSearchExp)); string[] strResult = new string[10]; int i = 0; Dictionary<string, double> KeywordList = new Dictionary<string, double>(); do { //terms = tenum.Term(); if (tenum.Term() != null) { //strResult[i] = terms.text.ToString(); KeywordList.Add(tenum.Term().text.ToString(), tenum.DocFreq()); } } while (tenum.Next() && tenum.Term().text.StartsWith(strSearchExp) && tenum.Term().text.Length > 1); var sortedDict = (from entry in KeywordList orderby entry.Value descending select entry); foreach (KeyValuePair<string, double> data in sortedDict) { if (data.Key.Length > 1) { strResult[i] = data.Key; i++; } if (i >= 10) //Exit the for Loop if the count exceeds 10 break; } tenum.Close(); rd.Close(); return strResult; } Can anyone please give me directions to achive this? Thanks for looking into this.

    Read the article

  • 403 error on index file

    - by John L.
    When I try to access index.py in my server root through http://domain/, I get a 403 Forbidden error, but when I can access it through http://domain/index.py. In my server logs it says "Options ExecCGI is off in this directory: /var/www/index.py". However, my httpd.conf entry for that directory is the same as the ones for other directories, and getting to index.py works fine. My permissions are set to 755 for index.py. I also tried making a php file and naming it index.php, and it works from both domain/ and domain/index.php. Here is my httpd.conf entry: <Directory /var/www> Options Indexes Includes FollowSymLinks MultiViews AllowOverride All Order allow,deny Allow from all AddHandler cgi-script .cgi AddHandler cgi-script .pl AddHandler cgi-script .py Options +ExecCGI DirectoryIndex index.html index.php index.py </Directory> Thanks

    Read the article

< Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15  | Next Page >