couchdb lucene - Page 16

Which NoSQL db to use with C?

- by systemsfault

Hello all, I'm working on an application that I'm going to write with C and i am considering to use a nosql db for storing timeseries data with at most 8 or 9 fields. But in every 5 minutes there will huge write operations such as 2-10 million rows and then there will be reads(but performance is not as crucial in read as in the write operation). I'm considering to use a NoSQL db here in order to store the data but couldn't decide on which one to use. Couchdb seems to have a stable driver called pillowtalk for C; but Mongo's driver doesn't look as promising as pillowtalk. I'm also open to other suggestions. What is your recommendation?

Read the article

Is having a single `IndexWriter` instance in Lucene a good idea?

- by Dragos

I am trying to understand how Lucene should be used. From what I have read, creating an IndexReader is costly, so using a Search Manager shoulg be the right choice. However, a SearchManager should be produced by a NRTManager(which, by the way, should replace the IndexWriter for every add or delete operation performed). But in order to have a NRTManager, I should first have an IndexWriter, and here comes my problem. The documentation says: an IndexWriter is thread-safe the constructor of this class takes a Directory object, so it seems creating an instace should be costly(as in the case of an IndexReader) all changes are buffered and flushed periodically(so they seem to encourage using a single instance) but: the changes, although flushed will only be visible after commit or close after finished making updates(add/delete), the instance should be closed I also found this: http://stackoverflow.com/questions/5374419/forgot-to-close-the-lucene-indexwriter-after-adding-documents-to-the-index where it is said that not closing a writer might ruin everything So what am I really supposed to do? Is having a single IndexWriter instance a good idea (make only commit and never close it)? EDIT: What is more, if I use NRTManager, how can I make acommit`? Is it even possible?

Read the article

Large scale storage for incrementally-appended documents?

- by Ben Dilts

I need to store hundreds of thousands (right now, potentially many millions) of documents that start out empty and are appended to frequently, but never updated otherwise or deleted. These documents are not interrelated in any way, and just need to be accessed by some unique ID. Read accesses are some subset of the document, which almost always starts midway through at some indexed location (e.g. "document #4324319, save #53 to the end"). These documents start very small, at several KB. They typically reach a final size around 500KB, but many reach 10MB or more. I'm currently using MySQL (InnoDB) to store these documents. Each of the incremental saves is just dumped into one big table with the document ID it belongs to, so reading part of a document looks like "select * from saves where document_id=14 and save_id 53 order by save_id", then manually concatenating it all together in code. Ideally, I'd like the storage solution to be easily horizontally scalable, with redundancy across servers (e.g. each document stored on at least 3 nodes) with easy recovery of crashed servers. I've looked at CouchDB and MongoDB as possible replacements for MySQL, but I'm not sure that either of them make a whole lot of sense for this particular application, though I'm open to being convinced. Any input on a good storage solution?

Read the article

What scalability problems have you solved using a NoSQL data store?

- by knorv

NoSQL refers to non-relational data stores that break with the history of relational databases and ACID guarantees. Popular open source NoSQL data stores include: Cassandra (tabular, written in Java, used by Facebook, Twitter, Digg, Rackspace, Mahalo and Reddit) CouchDB (document, written in Erlang, used by Engine Yard and BBC) Dynomite (key-value, written in C++, used by Powerset) HBase (key-value, written in Java, used by Bing) Hypertable (tabular, written in C++, used by Baidu) Kai (key-value, written in Erlang) MemcacheDB (key-value, written in C, used by Reddit) MongoDB (document, written in C++, used by Sourceforge, Github, Electronic Arts and NY Times) Neo4j (graph, written in Java, used by Swedish Universities) Project Voldemort (key-value, written in Java, used by LinkedIn) Redis (key-value, written in C, used by Engine Yard, Github and Craigslist) Riak (key-value, written in Erlang, used by Comcast and Mochi Media) Ringo (key-value, written in Erlang, used by Nokia) Scalaris (key-value, written in Erlang, used by OnScale) ThruDB (document, written in C++, used by JunkDepot.com) Tokyo Cabinet/Tokyo Tyrant (key-value, written in C, used by Mixi.jp (Japanese social networking site)) I'd like to know about specific problems you - the SO reader - have solved using data stores and what NoSQL data store you used. Questions: What scalability problems have you used NoSQL data stores to solve? What NoSQL data store did you use? What database did you use before switching to a NoSQL data store? I'm looking for first-hand experiences, so please do not answer unless you have that.

Read the article

Non-Relational Database Design

- by Ian Varley

I'm interested in hearing about design strategies you have used with non-relational "nosql" databases - that is, the (mostly new) class of data stores that don't use traditional relational design or SQL (such as Hypertable, CouchDB, SimpleDB, Google App Engine datastore, Voldemort, Cassandra, SQL Data Services, etc.). They're also often referred to as "key/value stores", and at base they act like giant distributed persistent hash tables. Specifically, I want to learn about the differences in conceptual data design with these new databases. What's easier, what's harder, what can't be done at all? Have you come up with alternate designs that work much better in the non-relational world? Have you hit your head against anything that seems impossible? Have you bridged the gap with any design patterns, e.g. to translate from one to the other? Do you even do explicit data models at all now (e.g. in UML) or have you chucked them entirely in favor of semi-structured / document-oriented data blobs? Do you miss any of the major extra services that RDBMSes provide, like relational integrity, arbitrarily complex transaction support, triggers, etc? I come from a SQL relational DB background, so normalization is in my blood. That said, I get the advantages of non-relational databases for simplicity and scaling, and my gut tells me that there has to be a richer overlap of design capabilities. What have you done? FYI, there have been StackOverflow discussions on similar topics here: the next generation of databases changing schemas to work with Google App Engine choosing a document-oriented database

Read the article

Multiple inequality conditions (range queries) in NoSQL

- by pableu

Hi, I have an application where I'd like to use a NoSQL database, but I still want to do range queries over two different properties, for example select all entries between times T1 and T2 where the noiselevel is smaller than X. On the other hand, I would like to use a NoSQL/Key-Value store because my data is very sparse and diverse, and I do not want to create new tables for every new datatype that I might come across. I know that you cannot use multiple inequality filters for the Google Datastore (source). I also know that this feature is coming (according to this). I know that this is also not possible in CouchDB (source). I think I also more or less understand why this is the case. Now, this makes me wonder.. Is that the case with all NoSQL databases? Can other NoSQL systems make range queries over two different properties? How about, for example, Mongo DB? I've looked in the Documentation, but the only thing I've found was the following snippet in their docu: Note that any of the operators on this page can be combined in the same query document. For example, to find all document where j is not equal to 3 and k is greater than 10, you'd query like so: db.things.find({j: {$ne: 3}, k: {$gt: 10} }); So they use greater-than and not-equal on two different properties. They don't say anything about two inequalities ;-) Any input and enlightenment is welcome :-)

Read the article

What database systems should an startup company consider?

- by Am

Right now I'm developing the prototype of a web application that aggregates large number of text entries from a large number of users. This data must be frequently displayed back and often updated. At the moment I store the content inside a MySQL database and use NHibernate ORM layer to interact with the DB. I've got a table defined for users, roles, submissions, tags, notifications and etc. I like this solution because it works well and my code looks nice and sane, but I'm also worried about how MySQL will perform once the size of our database reaches a significant number. I feel that it may struggle performing join operations fast enough. This has made me think about non-relational database system such as MongoDB, CouchDB, Cassandra or Hadoop. Unfortunately I have no experience with either. I've read some good reviews on MongoDB and it looks interesting. I'm happy to spend the time and learn if one turns out to be the way to go. I'd much appreciate any one offering points or issues to consider when going with none relational dbms?

Read the article

Is it better to use a relational database or document-based database for an app like Wufoo?

- by mboyle

I'm working on an application that's similar to Wufoo in that it allows our users to create their own databases and collect/present records with auto generated forms and views. Since every user is creating a different schema (one user might have a database of their baseball card collection, another might have a database of their recipes) our current approach is using MySQL to create separate databases for every user with its own tables. So in other words, the databases our MySQL server contains look like: main-web-app-db (our web app containing tables for users account info, billing, etc) user_1_db (baseball_cards_table) user_2_db (recipes_table) .... And so on. If a user wants to set up a new database to keep track of their DVD collection, we'd do a "create database ..." with "create table ...". If they enter some data in and then decide they want to change a column we'd do an "alter table ....". Now, the further along I get with building this out the more it seems like MySQL is poorly suited to handling this. 1) My first concern is that switching databases every request, first to our main app's database for authentication etc, and then to the user's personal database, is going to be inefficient. 2) The second concern I have is that there's going to be a limit to the number of databases a single MySQL server can host. Pretending for a moment this application had 500,000 user databases, is MySQL designed to operate this way? What if it were a million, or more? 3) Lastly, is this method going to be a nightmare to support and scale? I've never heard of MySQL being used in this way so I do worry about how this affects things like replication and other methods of scaling. To me, it seems like MySQL wasn't built to be used in this way but what do I know. I've been looking at document-based databases like MongoDB, CouchDB, and Redis as alternatives because it seems like a schema-less approach to this particular problem makes a lot of sense. Can anyone offer some advice on this?

Read the article

Hosted full text search solutions?

- by James Cooper

Does anyone know of companies offering SaaS full text search? I'm looking for something that uses Lucene, solr, or sphinx on the backend, and provides a REST API for submitting documents to index, and running searches. I could build my own EC2 AMI, but I'd have to configure EBS and other stuff, monitor it, etc. Curious if someone has already done all this and would charge per MB/GB indexed. thank you. -- James

Read the article

Asp.net library to extract plain text from docx, pptx, xlsx (for search index)

- by Myster

Is there a pre-existing library to extract plain text form docx, pptx, and xlsx files? I require this to populate a lucene.net index. I've found this example which extracts text from docx and it seems to work ok. But before building my own solution based on this I was wondering if there's something already available for the other file formats?

Read the article

Advice on reading indexes

- by London

Hello, I'm trying to figure out the right way to read lucene index only once whilst running the application multiple times, how can I do that in java? Because indexed data will not change so reading them each time would not be necessary. Can someone explain me the logic of it reading them only once? thank you

Read the article

NoSQL DB for .Net document-based database (ECM)

- by Dane

I'm halfway through coding a basic multi-tenant SaaS ECM solution. Each client has it's own instance of the database / datastore, but the .Net app is single instance. The documents are pretty much read only (i.e. an image archive of tiffs or PDFs) I've used MSSQL so far, but then started thinking this might be viable in a NoSQL DB (e.g. MongoDB, CouchDB). The basic premise is that it stores documents, each with their own particular indexes. Each tenant can have multiple document types. e.g. One tenant might have an invoice type, which has Customer ID, Invoice Number and Invoice Date. Another tenant might have an application form, which has Member Number, Application Number, Member Name, and Application Date. So far I've used the old method which Sharepoint (used?) to use, and created a document table which has int_field_1, int_field_2, date_field_1, date_field_2, etc. Then, I've got a "mapping" table which stores the customer specific index name, and the database field that will map to. I've avoided the key-value pair model in the DB due to volume of documents. This way, we can support multiple document types in the one table, and get reasonably high performance out of it, and allow for custom document type searches (i.e. user selects a document type, then they're presented with a list of search fields). However, a NoSQL DB might make this a lot simpler, as I don't need to worry about denormalizing the document. However, I've just got concerns about the rest of the data around a document. We store an "action history" against the document. This tracks views, whether someone emails the document from within the system, and other "future" functionality (e.g. faxing). We have control over the document load process, so we can manipulate the data however it needs to be to get it in the document store (e.g. assign unique IDs). Users will not be adding in their own documents, so we shouldn't need to worry about ACID compliance, as the documents are relatively static. So, my questions I guess : Is a NoSQL DB a good fit Is MongoDB the best for Asp.Net (I saw Raven and Velocity, but they're still kinda beta) Can I store a key for each document, and then store the action history in a MSSQL DB with this key? I don't need to do joins, it would be if a person clicks "View History" against a document. How would performance compare between the two (NoSQL DB vs denormalized "document" table) Volumes would be up to 200,000 new documents per month for a single tenant. My current scaling plan with the SQL DB involves moving the SQL DB into a cluster when certain thresholds are reached, and then reviewing partitioning and indexing structures.

Read the article

I'm searching for a messaging platform (like XMPP) that allows tight integration with a web applicat

- by loxs

Hi, At the company I work for, we are building a cluster of web applications for collaboration. Things like accounting, billing, CRM etc. We are using a RESTfull technique: For database we use CouchDB Different applications communicate with one another and with the database via http. Besides, we have a single sign on solution, so that when you login in one application, you are automatically logged to the other. For all apps we use Python (Pylons). Now we need to add instant messaging to the stack. We need to support both web and desktop clients. But just being able to chat is not enough. We need to be able to achieve all of the following (and more similar things). When somebody gets assigned to a task, they must receive a message. I guess this is possible with some system daemon. There must be an option to automatically group people in groups by lots of different properties. For example, there must be groups divided both by geographical location, by company division, by job type (all the programers from different cities and different company divisions must form a group), so that one can send mass messages to a group of choice. Rooms should be automatically created and destroyed. For example when several people visit the same invoice, a room for them must be automatically created (and they must auto-join). And when all leave the invoice, the room must be destroyed. Authentication and authorization from our applications. I can implement this using custom solutions like hookbox http://hookbox.org/docs/intro.html but then I'll have lots of problems in supporting desktop clients. I have no former experience with instant messaging. I've been reading about this lately. I've been looking mostly at things like ejabberd. But it has been a hard time and I can't find whether what I want is possible at all. So I'd be happy if people with experience in this field could help me with some advice, articles, tales of what is possible etc.

Read the article

KeepAliveException when using HttpWebRequest.GetResponse

- by Lucas

I am trying to POST an attachment to CouchDB using the HttpWebRequest. However, when I attempt "response = (HttpWebResponse)httpWebRequest.GetResponse();" I receive a WebException with the message "The underlying connection was closed: A connection that was expected to be kept alive was closed by the server." I have found some articles stating that setting the keepalive to false and httpversion to 1.0 resolves the situation. I am finding that it does not yeilding the exact same error, plus I do not want to take that approach as I do not want to use the 1.0 version due to how it handles the connection. Any suggestions or ideas are welcome. I'll try them all until one works! public ServerResponse PostAttachment(Server server, Database db, Attachment attachment) { Stream dataStream; HttpWebResponse response = null; StreamReader sr = null; byte[] buffer; string json; string boundary = "----------------------------" + DateTime.Now.Ticks.ToString("x"); string headerTemplate = "Content-Disposition: form-data; name=\"_attachments\"; filename=\"" + attachment.Filename + "\"\r\n Content-Type: application/octet-stream\r\n\r\n"; byte[] headerbytes = System.Text.Encoding.UTF8.GetBytes(headerTemplate); byte[] boundarybytes = System.Text.Encoding.ASCII.GetBytes("\r\n--" + boundary + "\r\n"); HttpWebRequest httpWebRequest = (HttpWebRequest)WebRequest.Create("http://" + server.Host + ":" + server.Port.ToString() + "/" + db.Name + "/" + attachment.Document.Id); httpWebRequest.ContentType = "multipart/form-data; boundary=" + boundary; httpWebRequest.Method = "POST"; httpWebRequest.KeepAlive = true; httpWebRequest.ContentLength = attachment.Stream.Length + headerbytes.Length + boundarybytes.Length; if (!string.IsNullOrEmpty(server.EncodedCredentials)) httpWebRequest.Headers.Add("Authorization", server.EncodedCredentials); if (!attachment.Stream.CanRead) throw new System.NotSupportedException("The stream cannot be read."); // Get the request stream try { dataStream = httpWebRequest.GetRequestStream(); } catch (Exception e) { throw new WebException("Failed to get the request stream.", e); } buffer = new byte[server.BufferSize]; int bytesRead; dataStream.Write(headerbytes,0,headerbytes.Length); attachment.Stream.Position = 0; while ((bytesRead = attachment.Stream.Read(buffer, 0, buffer.Length)) > 0) { dataStream.Write(buffer, 0, bytesRead); } dataStream.Write(boundarybytes, 0, boundarybytes.Length); dataStream.Close(); // send the request and get the response try { response = (HttpWebResponse)httpWebRequest.GetResponse(); } catch (Exception e) { throw new WebException("Invalid response received from server.", e); } // get the server's response json try { dataStream = response.GetResponseStream(); sr = new StreamReader(dataStream); json = sr.ReadToEnd(); } catch (Exception e) { throw new WebException("Failed to access the response stream.", e); } // close up all our streams and response sr.Close(); dataStream.Close(); response.Close(); // Deserialize the server response return ConvertTo.JsonToServerResponse(json); }

Read the article

How can I search in transcluded categories?

- by Wikis

I want to add functionality to a MediaWiki wiki to search in specific categories: Platform 1 Platform 2 etc. So I created a template which, based on a certain field, assigns pages to those categories. The template was already included on most of these pages. So now most pages are in either: Category:Platform 1 or Category:Platform 2 Then I thought I just need to add incategory to the search and I'm done, as described on the Wikipedia page. But then I reread it and to my horror discovered: incategory: – using the incategory: parameter returns pages in a given category (as long as the pages are directly categorized, and not transcluded through templates). Eeeek! Is there any other way to search even in transcluded templates? Or any other way of resolving this?

Read the article

Best way to retrieve certain field of all documents returned by a lucen search

- by Philipp

Hi, I was wondering what the best way is to retrieve a certain field of all documents returned by a Searcher of Lucene. Background: each document has a date field (written on) and I would like to show a timeline of all found documents, so I need to extract the date (day) field of all the documents I find with the search. I currently retrieve every document using Searcher.doc(int, FieldSelector) having the selector only retrieve the certain field. I have indexed 250k documents, the search itself takes no time and returns about 10k document ids. Retrieving those however, takes 20+ seconds. What can I do to speed things up, but still get all the values I need. Thx in advance Philipp

Read the article

MSSQL Search Proper Names Full Text Index vs LIKE + SOUNDEX

- by Matthew Talbert

I have a database of names of people that has (currently) 35 million rows. I need to know what is the best method for quickly searching these names. The current system (not designed by me), simply has the first and last name columns indexed and uses "LIKE" queries with the additional option of using SOUNDEX (though I'm not sure this is actually used much). Performance has always been a problem with this system, and so currently the searches are limited to 200 results (which still takes too long to run). So, I have a few questions: Does full text index work well for proper names? If so, what is the best way to query proper names? (CONTAINS, FREETEXT, etc) Is there some other system (like Lucene.net) that would be better? Just for reference, I'm using Fluent NHibernate for data access, so methods that work will with that will be preferred. I'm using MS SQL 2008 currently.

Read the article

Grails searchable plugin

- by Don

Hi, In my Grails app, I'm using the Searchable plugin for searching/indexing. I want to write a Compass/Lucene query that involves multiple domain classes. Within that query when I want to refer to the id of a class, I can't simply use 'id' because all classes have an 'id' property. Currently, I work around this problem by adding the following property to a class Foo public Long getFooId() { return id } static transients = ['fooId'] Then when I want to refer to the id of Foo within a query I use 'fooId'. Is there a way I can provide an alias for a property in the searchable mapping rather than adding a property to the class?

Read the article

Need help in filtering records based on radius value in solr

- by kshama

Hi, I am using solr with Lucene spatial 2.9.1 as per http://www.ibm.com/developerworks/java/library/j-spatial/ I want to write a query, that will retrieve records within a given radius using hsin function, and using cartesian tiers as filters. So i wrote query like this http://localhost:8983/solr/select/?q=body:engineering colleges^0 AND _val_:"recip(hsin(0.227486,1.354193 , lat_rad, lng_rad, 4), 1, 1, 0)"^100 &&fq={!tier x=13.033993 y=77.589569 radians=false dist=4 prefix=tier_ unit=m} My records include many US records and few Indian records. For US records filtering based on radius is working fine. But for Indian records its not varying even if i change the radius . So can any one tell me if anything is wrong with the query or is there any configuration issues related to solr in order to make this work, or since record density is very less for Indian records filtering is not happening properly.Am not able to figure it out. Thanks in advance.

Read the article

JMS message received at only one server

- by BJH

I'm having a problem with a JEE6 application running in a clustered environment using WebSphere ApplicationServer 8. A search index is used for quick search in the UI (using Lucene), which must be re-indexed after new data arrived in the corresponding DB layer. To achieve this we're sending a JMS message to the application, then the search index will be refreshed. The problem is, that the messages only arrives at one of the cluster members. So only there the search index is up to date. At the other servers it remains outdated. How can I achieve that the search index gets updated at all cluster members? Can I receive the message somehow on all servers? Or is there a better way to do this?

Read the article

Nested BooleanQuery?

- by KailZhang

I'm using a BooleanQuery to combine several queries. I find that if I add a BooleanQuery to the BooleanQuery, then no result is returned. The added BooleanQuery is a MUST_NOT one, like -city_id:100. But as lucene's spec says, BooleanQuery could be nested, which I think means it's okay to add such BooleanQuery. Now I have to get all clauses from the BooleanQuery to be added, and then add them to the container BooleanQuery one by one. I'm a bit confused. Anybody could help? Thank you very much!

Read the article

Boost Solr results based on the field that contained the hit

- by TomFor

Hi, I was browsing the web looking for a indexing and search framework and stumbled upon Solr. A functionality that we abolutely need is to boost results based on what field contained the hit. A small example: Consider a record like this: <movie> <title>The Dark Knight</title> <alternative_title>Batman Begins 2</alternative_title> <year>2008</year> <director>Christopher Nolan</director> <plot>Batman, Gordon and Harvey Dent are forced to deal with the chaos unleashed by an anarchist mastermind known only as the Joker, as it drives each of them to their limits.</plot> </movie> I want to combine for example the title, alternative_title and plot fields into one search field, which isn't too difficult after looking at the Solr/Lucene documentation and tutorials. However I also want that movies that have a hit in title have a higher score than hits on alternative_title and those in their turn should score higher than hits in the plot field. Is there any way to indicate this kond of scoring in the xml or do we need to develop some custom scoring algorythm? Please also note that the example I've givnen is fictional end the real data will probably contain 100+ fields. Thanks in advance, Tom

Read the article

SQL Server Search Proper Names Full Text Index vs LIKE + SOUNDEX

- by Matthew Talbert

I have a database of names of people that has (currently) 35 million rows. I need to know what is the best method for quickly searching these names. The current system (not designed by me), simply has the first and last name columns indexed and uses "LIKE" queries with the additional option of using SOUNDEX (though I'm not sure this is actually used much). Performance has always been a problem with this system, and so currently the searches are limited to 200 results (which still takes too long to run). So, I have a few questions: Does full text index work well for proper names? If so, what is the best way to query proper names? (CONTAINS, FREETEXT, etc) Is there some other system (like Lucene.net) that would be better? Just for reference, I'm using Fluent NHibernate for data access, so methods that work will with that will be preferred. I'm using SQL Server 2008 currently. EDIT I want to add that I'm very interested in solutions that will deal with things like commonly misspelled names, eg 'smythe', 'smith', as well as first names, eg 'tomas', 'thomas'. Query Plan |--Parallelism(Gather Streams) |--Nested Loops(Inner Join, OUTER REFERENCES:([testdb].[dbo].[Test].[Id], [Expr1004]) OPTIMIZED WITH UNORDERED PREFETCH) |--Hash Match(Inner Join, HASH:([testdb].[dbo].[Test].[Id])=([testdb].[dbo].[Test].[Id])) | |--Bitmap(HASH:([testdb].[dbo].[Test].[Id]), DEFINE:([Bitmap1003])) | | |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([testdb].[dbo].[Test].[Id])) | | |--Index Seek(OBJECT:([testdb].[dbo].[Test].[IX_Test_LastName]), SEEK:([testdb].[dbo].[Test].[LastName] >= 'WHITDþ' AND [testdb].[dbo].[Test].[LastName] < 'WHITF'), WHERE:([testdb].[dbo].[Test].[LastName] like 'WHITE%') ORDERED FORWARD) | |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([testdb].[dbo].[Test].[Id])) | |--Index Seek(OBJECT:([testdb].[dbo].[Test].[IX_Test_FirstName]), SEEK:([testdb].[dbo].[Test].[FirstName] >= 'THOMARþ' AND [testdb].[dbo].[Test].[FirstName] < 'THOMAT'), WHERE:([testdb].[dbo].[Test].[FirstName] like 'THOMAS%' AND PROBE([Bitmap1003],[testdb].[dbo].[Test].[Id],N'[IN ROW]')) ORDERED FORWARD) |--Clustered Index Seek(OBJECT:([testdb].[dbo].[Test].[PK__TEST__3214EC073B95D2F1]), SEEK:([testdb].[dbo].[Test].[Id]=[testdb].[dbo].[Test].[Id]) LOOKUP ORDERED FORWARD) SQL for above: SELECT * FROM testdb.dbo.Test WHERE LastName LIKE 'WHITE%' AND FirstName LIKE 'THOMAS%' Based on advice from Mitch, I created an index like this: CREATE INDEX IX_Test_Name_DOB ON Test (LastName ASC, FirstName ASC, BirthDate ASC) INCLUDE (and here I list the other columns) My searches are now incredibly fast for my typical search (last, first, and birth date).

Read the article

How to search a PDF in Acrobat Reader AND jump to a certain page via parameter?

- by agez

Hi, we are using lucene within a web application to search in a great number of PDF documents. The workflow is like this: A user enters a search term A list of search results is presented to the user. Each search result represents one PDF document and shows the user on which page the search term was found. Each of these pages is represented as a hyperlink. If the user now clicks on such a hyperlink, he directly jumps to that page. But now the user has the problem that the search term isn't highlighted on the page. Therefore the user has to look on his own to find the search term on the page. What we wanted is a way to highlight the search term on the specific page in the PDF. The open parameters for Acrobat Reader allow for either searching a PDF document (with hit highlighting) OR jumping to a specific page. But the combination of both parameters - which we would need - doesn't work. Does anyone have an idea how jumping to a page and highlighting a search term in a pdf document could work? I had a look at the Acrobat SDK but don't see how we can use it (it's terribly documented). Cheers, Helmut

Read the article

Full Text Search like Google

- by Eduardo

I would like to implement full-text-search in my off-line (android) application to search the user generated list of notes. I would like it to behave just like Google (since most people are already used to querying to Google) My initial requirements are: Fast: like Google or as fast as possible, having 100000 documents with 200 hundred words each. Searching for two words should only return documents that contain both words (not just one word) (unless the OR operator is used) Case insensitive (aka: normalization): If I have the word 'Hello' and I search for 'hello' it should match. Diacritical mark insensitive: If I have the word 'así' a search for 'asi' should match. In Spanish, many people, incorrectly, either do not put diacritical marks or fail in correctly putting them. Stop word elimination: To not have a huge index meaningless words like 'and', 'the' or 'for' should not be indexed at all. Dictionary substitution (aka: stem words): Similar words should be indexed as one. For example, instances of 'hungrily' and 'hungry' should be replaced with 'hunger'. Phrase search: If I have the text 'Hello world!' a search of '"world hello"' should not match it but a search of '"hello world"' should match. Search all fields (in multifield documents) if no field specified (not just a default field) Auto-completion in search results while typing to give popular searches. (just like Google Suggest) How may I configure a full-text-search engine to behave as much as possible as Google? (I am mostly interested in Open Source, Java and in particular Lucene)

Search Results

Search found 631 results on 26 pages for 'couchdb lucene'.

Page 16/26 | < Previous Page | 12 13 14 15 16 17 18 19 20 21 22 23 | Next Page >

- by systemsfault

- by Dragos

- by Ben Dilts

- by knorv

- by Ian Varley

- by pableu

- by Am

- by mboyle

- by James Cooper

- by Myster

- by London

- by Dane

- by loxs

- by Lucas

- by Wikis

- by Philipp

- by Matthew Talbert

- by Don

- by kshama

- by BJH

- by KailZhang

- by TomFor

- by Matthew Talbert

- by agez

- by Eduardo

< Previous Page | 12 13 14 15 16 17 18 19 20 21 22 23 | Next Page >