Search Results

Search found 190 results on 8 pages for 'sphinx'.

Page 5/8 | < Previous Page | 1 2 3 4 5 6 7 8  | Next Page >

  • Mysql Database Question about Large Columns

    - by murat
    Hi, I have a table that has 100.000 rows, and soon it will be doubled. The size of the database is currently 5 gb and most of them goes to one particular column, which is a text column for PDF files. We expect to have 20-30 GB or maybe 50 gb database after couple of month and this system will be used frequently. I have couple of questions regarding with this setup 1-) We are using innodb on every table, including users table etc. Is it better to use myisam on this table, where we store text version of the PDF files? (from memory usage /performance perspective) 2-) We use Sphinx for searching, however the data must be retrieved for highlighting. Highlighting is done via sphinx API but still we need to retrieve 10 rows in order to send it to Sphinx again. This 10 rows may allocate 50 mb memory, which is quite large. So I am planning to split these PDF files into chunks of 5 pages in the database, so these 100.000 rows will be around 3-4 million rows and couple of month later, instead of having 300.000-350.000 rows, we'll have 10 million rows to store text version of these PDF files. However, we will retrieve less pages, so again instead of retrieving 400 pages to send Sphinx for highlighting, we can retrieve 5 pages and it will have a big impact on the performance. Currently, when we search a term and retrieve PDF files that have more than 100 pages, the execution time is 0.3-0.35 seconds, however if we retrieve PDF files that have less than 5 pages, the execution time reduces to 0.06 seconds, and it also uses less memory. Do you think, this is a good trade-off? We will have million of rows instead of having 100k-200k rows but it will save memory and improve the performance. Is it a good approach to solve this problem and do you have any ideas how to overcome this problem? The text version of the data is used only for indexing and highlighting. So, we are very flexible. Thanks,

    Read the article

  • Architecture for analysing search result impressions/clicks to improve future searches

    - by Hais
    We have a large database of items (10m+) stored in MySQL and intend to implement search on metadata on these items, taking advantage of something like Sphinx. The dataset will be changing slightly on a daily basis so Sphinx will be re-indexing daily. However we want the algorithm to self-learn and improve search results by analysing impression and click data so that we provide better results for our customers on that search term, and possibly other similar search terms too. I've been reading up on Hadoop and it seems like it has the potential to crunch all this data, although I'm still unsure how to approach it. Amazon has tutorials for compiling impression vs click data using MapReduce but I can't see how to get this data in a useable format. My idea is that when a search term comes in I query Sphinx to get all the matching items from the dataset, then query the analytics (compiled on an hourly basis or similar) so that we know the most popular items for that search term, then cache the final results using something like Memcached, Membase or similar. Am I along the right lines here?

    Read the article

  • Y my interface is not showing when i run the project

    - by Nubkadiya
    i have configured the Sphinx and i have used Main thread to do the recognition part. so that i can avoid the buttons. so currently my design is when the application runs it will check any voice recognition and prompt in the labels. but when i run the project it dont display the interface of the application. only the frame shows. here is the code. if you guys can provide me with any solution for this. it will be great. /* * To change this template, choose Tools | Templates * and open the template in the editor. */ /* * FinalMainWindow.java * * Created on May 17, 2010, 11:22:29 AM */ package FYP; import edu.cmu.sphinx.frontend.util.Microphone; import edu.cmu.sphinx.recognizer.Recognizer; import edu.cmu.sphinx.result.Result; import edu.cmu.sphinx.util.props.ConfigurationManager; //import javax.swing.; //import java.io.; public class FinalMainWindow extends javax.swing.JFrame{ Recognizer recognizer; private void allocateRecognizer() { ConfigurationManager cm; cm = new ConfigurationManager("helloworld.config.xml"); this.recognizer = (Recognizer) cm.lookup("recognizer"); this.recognizer.allocate(); Microphone microphone = (Microphone) cm.lookup("microphone");// TODO add // your if (!microphone.startRecording()) { // System.out.println("Cannot start microphone."); //this.jlblDest.setText("Cannot Start Microphone"); // this.jprogress.setText("Cannot Start Microphone"); System.out.println("Cannot Start Microphone"); this.recognizer.deallocate(); System.exit(1); } } boolean allocated; // property file eka....... //code.google.com private void voiceMajorInput() { if (!allocated) { this.allocateRecognizer(); allocated = true; } Result result = recognizer.recognize(); if (result != null) { String resultText = result.getBestFinalResultNoFiller(); System.out.println("Recognized Result is " +resultText); this.jhidden.setText(resultText); } } /** Creates new form FinalMainWindow */ public FinalMainWindow() { initComponents(); } /** This method is called from within the constructor to * initialize the form. * WARNING: Do NOT modify this code. The content of this method is * always regenerated by the Form Editor. */ @SuppressWarnings("unchecked") // <editor-fold defaultstate="collapsed" desc="Generated Code"> private void initComponents() { jhidden = new javax.swing.JLabel(); setDefaultCloseOperation(javax.swing.WindowConstants.EXIT_ON_CLOSE); jhidden.setText("jLabel1"); javax.swing.GroupLayout layout = new javax.swing.GroupLayout(getContentPane()); getContentPane().setLayout(layout); layout.setHorizontalGroup( layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING) .addGroup(layout.createSequentialGroup() .addGap(51, 51, 51) .addComponent(jhidden) .addContainerGap(397, Short.MAX_VALUE)) ); layout.setVerticalGroup( layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING) .addGroup(layout.createSequentialGroup() .addGap(45, 45, 45) .addComponent(jhidden) .addContainerGap(293, Short.MAX_VALUE)) ); pack(); }// </editor-fold> /** * @param args the command line arguments */ public static void main(String args[]) { java.awt.EventQueue.invokeLater(new Runnable() { public void run() { // new FinalMainWindow().setVisible(true); FinalMainWindow mw = new FinalMainWindow(); mw.setVisible(true); mw.voiceMajorInput(); new FinalMainWindow().setVisible(true); } }); } // Variables declaration - do not modify private javax.swing.JLabel jhidden; // End of variables declaration }

    Read the article

  • Why is my interface is not showing when i run the project?

    - by Nubkadiya
    i have configured the Sphinx and i have used Main thread to do the recognition part. so that i can avoid the buttons. so currently my design is when the application runs it will check any voice recognition and prompt in the labels. but when i run the project it dont display the interface of the application. only the frame shows. here is the code. if you guys can provide me with any solution for this. it will be great. /* * To change this template, choose Tools | Templates * and open the template in the editor. */ /* * FinalMainWindow.java * * Created on May 17, 2010, 11:22:29 AM */ package FYP; import edu.cmu.sphinx.frontend.util.Microphone; import edu.cmu.sphinx.recognizer.Recognizer; import edu.cmu.sphinx.result.Result; import edu.cmu.sphinx.util.props.ConfigurationManager; //import javax.swing.; //import java.io.; public class FinalMainWindow extends javax.swing.JFrame{ Recognizer recognizer; private void allocateRecognizer() { ConfigurationManager cm; cm = new ConfigurationManager("helloworld.config.xml"); this.recognizer = (Recognizer) cm.lookup("recognizer"); this.recognizer.allocate(); Microphone microphone = (Microphone) cm.lookup("microphone");// TODO add // your if (!microphone.startRecording()) { // System.out.println("Cannot start microphone."); //this.jlblDest.setText("Cannot Start Microphone"); // this.jprogress.setText("Cannot Start Microphone"); System.out.println("Cannot Start Microphone"); this.recognizer.deallocate(); System.exit(1); } } boolean allocated; // property file eka....... //code.google.com private void voiceMajorInput() { if (!allocated) { this.allocateRecognizer(); allocated = true; } Result result = recognizer.recognize(); if (result != null) { String resultText = result.getBestFinalResultNoFiller(); System.out.println("Recognized Result is " +resultText); this.jhidden.setText(resultText); } } /** Creates new form FinalMainWindow */ public FinalMainWindow() { initComponents(); } /** This method is called from within the constructor to * initialize the form. * WARNING: Do NOT modify this code. The content of this method is * always regenerated by the Form Editor. */ @SuppressWarnings("unchecked") // <editor-fold defaultstate="collapsed" desc="Generated Code"> private void initComponents() { jhidden = new javax.swing.JLabel(); setDefaultCloseOperation(javax.swing.WindowConstants.EXIT_ON_CLOSE); jhidden.setText("jLabel1"); javax.swing.GroupLayout layout = new javax.swing.GroupLayout(getContentPane()); getContentPane().setLayout(layout); layout.setHorizontalGroup( layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING) .addGroup(layout.createSequentialGroup() .addGap(51, 51, 51) .addComponent(jhidden) .addContainerGap(397, Short.MAX_VALUE)) ); layout.setVerticalGroup( layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING) .addGroup(layout.createSequentialGroup() .addGap(45, 45, 45) .addComponent(jhidden) .addContainerGap(293, Short.MAX_VALUE)) ); pack(); }// </editor-fold> /** * @param args the command line arguments */ public static void main(String args[]) { java.awt.EventQueue.invokeLater(new Runnable() { public void run() { // new FinalMainWindow().setVisible(true); FinalMainWindow mw = new FinalMainWindow(); mw.setVisible(true); mw.voiceMajorInput(); new FinalMainWindow().setVisible(true); } }); } // Variables declaration - do not modify private javax.swing.JLabel jhidden; // End of variables declaration }

    Read the article

  • Always-indexed MySQL indexing/searching replacements for InnoDB?

    - by Chad Johnson
    I am using InnoDB for a MySQL table, and obviously queries using LIKE and RLIKE/REGEXP can take a lot of time. I've tried Spinx, and it works great, except I have to re-index context at intervals. I can re-index every minute, but I am wondering if there is either 1) a setting in Sphinx to keep records always indexed or 2) other software besides Sphinx that will keep records always indexed. I want it where that immediately upon inserting or updating a record, the index is updated.

    Read the article

  • MySQL efficiency as it relates to the database/table size

    - by mlissner
    I'm building a system using django, Sphinx and MySQL that's very quickly becoming quite large. The database currently has about 2000 rows, and I've written a program that's going to populate it with another 40,000 rows in a couple days. Since the database is live right now, and since I've never had a database with this much information in it, I'm worried about some things: Is adding all these rows going to seriously degrade the efficiency of my django app? Will I need to go back through it and optimize all my database calls so they're doing things more cleverly? Or will this make the database slow all around to the extent that I can't do anything about it at all? If you scoff at my 40k rows, then, my next question is, at what point SHOULD I be concerned? I will likely be adding another couple hundred thousand soon, so I worry, and I fret. How is sphinx going to feel about all this? Is it going to freak out when it realizes it has to index all this data? Or will it be fine? Is this normal for it? If it is, at what point should I be concerned that it's too much data for Sphinx? Thanks for any thoughts.

    Read the article

  • Article search engine in php

    - by Jason
    Hello, I am using sphinx as a search engine on my website its working perfect and I have no complain with it. The only thing it lacks is, it does not allow me to search articles whose query length is more than 15 words. I know in reality people don't use more than 3-4 words i want to use it for finding duplicate contents. I was wondering if there is any alternative solution to sphinx. I want to cope with duplicate contents. My main articles table is in innodb but I am also caching articles into MyISAM table as well for full text searching but when I search an article it takes ages to perform one search. Its not the query problem, i think mysql lacks the fulltext searching facility. Thanks Jason

    Read the article

  • How to get field name......

    - by user229538
    Recently i have implemented django-sphinx search on my website. It is working fine of each separate model. But now my client requirement has changed. To implement that functionality i need field name to whom search is made. suppose my query is: "select id, name,description from table1" and search keyword is matched with value in field "name". So i need to return that field also. Is it possible to get field name or any method provided by django-sphinx which return field name. Please help me...

    Read the article

  • how to setup the sphinx with netbeans

    - by Pradeep
    i have successfully configured sphinx4 with eclipse. for that these steps i have used. copy my java and config files to SRC folder all the necessary jar files (in the lib). the lib folder added to the root of the project build those jar files (jsapi files too) change the configuration file and give the proper path test the java file but in Netbeans i really dont understand how to do the proper steps. can someone help me. the jar files should be added to "Libraries" rite. then after adding them how to build them. in the netbeans it dont show a SRC folder. so all the java files and configuration files should go to Source Packages folder rite. can someone help me with this. please

    Read the article

  • search engine (solr/sphinx) question

    - by noname
    i want to make my threads content searchable with full text search engines like solr. but i wonder one thing. should i index just the thread.title, thread.body and post.body or should i index username, created date, nr of posts, views, country, region and city too that belongs to thread? i mean when an user search for a thread he will get hits returned containing thread title, 2 lines of body, which user has posted it, creation date, tags, and so on. should i index all this information too? but then it would be pretty much the whole database. or should i just index the 3 first columns i mentioned for full text search. and another question. when an user post a new thread, then i have to immidiately tell solr to add that row? if im not, how would it be searchable?

    Read the article

  • MySQL Full-Text Search Across Multiple Tables - Quick/Long Solution?

    - by Kerry
    Hello all, I have been doing a bit of research on full-text searches as we realized a series of LIKE statements are terrible. My first find was MySQL full-text searches. I tried to implement this and it worked on one table, failed when I was trying to join multiple tables, and so I consulted stackoverflow's articles (look at the end for a list of the ones I've been to) I didn't see anything that clearly answered my questions. I'm trying to get this done literally in an hour or two (quick solution) but I also want to do a better long term solution. Here is my query: SELECT a.`product_id`, a.`name`, a.`slug`, a.`description`, b.`list_price`, b.`price`, c.`image`, c.`swatch`, e.`name` AS industry FROM `products` AS a LEFT JOIN `website_products` AS b ON (a.`product_id` = b.`product_id`) LEFT JOIN ( SELECT `product_id`, `image`, `swatch` FROM `product_images` WHERE `sequence` = 0) AS c ON (a.`product_id` = c.`product_id`) LEFT JOIN `brands` AS d ON (a.`brand_id` = d.`brand_id`) INNER JOIN `industries` AS e ON (a.`industry_id` = e.`industry_id`) WHERE b.`website_id` = 96 AND b.`status` = 1 AND b.`active` = 1 AND MATCH( a.`name`, a.`sku`, a.`description`, d.`name` ) AGAINST ( 'ashley sofa' ) GROUP BY a.`product_id` ORDER BY b.`sequence` LIMIT 0, 9 The error I get is: Incorrect arguments to MATCH If I remove d.name from the MATCH statement it works. I have a full-text index on that column. I saw one of the articles say to use an OR MATCH for this table, but won't that lose the effectiveness of being able to rank them together or match them properly? Other places said to use UNIONs but I don't know how to do that properly. Any advice would be greatly appreciated. In the idea of a long term solution it seems that either Sphinx or Lucene is best. Now by no means and I a MySQL guru, and I heard that Lucene is a bit more complicated to setup, any recommendations or directions would be great. Articles: http://stackoverflow.com/questions/1117005/mysql-full-text-search-across-multiple-tables http://stackoverflow.com/questions/668371/mysql-fulltext-search-across-1-table http://stackoverflow.com/questions/2378366/mysql-how-to-make-multiple-table-fulltext-search http://stackoverflow.com/questions/737275/pros-cons-of-full-text-search-engine-lucene-sphinx-postgresql-full-text-searc http://stackoverflow.com/questions/1059253/searching-across-multiple-tables-best-practices

    Read the article

  • Rails paginate existing array of ActiveRecord results

    - by SaoiseK
    Hello, I generally use will_paginate for the pagination in my app, but have hit a stumbler on my search feature. I'm using Thinking Sphinx for doing my full-text search, which returns results paginated. The problem I'm having is that after I've received the results from Thinking Sphinx, I need to merge them with some other results and re-order them. Once I've finished processing them I have an Array of results that is very different from the original from TS. As there could be 1000+ results in this Array Pagination is a necessity. The problem is that I can't figure out how to get will_paginate to play with an existing array. I've done some research and it seems the only solutions to this problem are from several years ago and are based around the old built-in Paginator class. The most recent one I could find that makes use of will_paginate was from devchix from mid-2007: http://www.devchix.com/2007/07/23/will_paginate-array/comment-page-1/ - I've given this a go but it doesn't seem to do anything for me. Are there any current methods for applying pagination (preferably via will_paginate) for existing arrays of AR results?

    Read the article

  • Scalable Full Text Search With Per User Result Ordering

    - by jeremy
    What options exist for creating a scalable, full text search with results that need to be sorted on a per user basis? This is for PHP/MySQL (Symfony/Doctrine as well, if relevant). In our case, we have a database of workouts that have been performed by users. The workouts that the user has done before should appear at the top of the results. The more frequently they've done the workout, the higher it should appear in search matches. If it helps, you can assume we know the number of times a user has done a workout in advance. Possible Solutions Sphinx - Use Sphinx to implement full text search, do all the querying and sorting in MySQL. This seems promising (and there's a Symfony Plugin!) but I don't know much about it. Lucene - Use Lucene to perform full text search and put the users' completions into the query. As is suggested in this Stack Overflow thread. Alternatively, use Lucene to retrieve the results, then reorder them in PHP. However, both solutions seem clunky and potentially unscalable as a user may have completed hundreds of workouts. Mysql - No native full text support (InnoDB), so we'd have use LIKE or REGEX, which isn't scalable.

    Read the article

  • Hosted full text search solutions?

    - by James Cooper
    Does anyone know of companies offering SaaS full text search? I'm looking for something that uses Lucene, solr, or sphinx on the backend, and provides a REST API for submitting documents to index, and running searches. I could build my own EC2 AMI, but I'd have to configure EBS and other stuff, monitor it, etc. Curious if someone has already done all this and would charge per MB/GB indexed. thank you. -- James

    Read the article

  • Data denormalization and C# objects DB serialization

    - by Robert Koritnik
    I'm using a DB table with various different entities. This means that I can't have an arbitrary number of fields in it to save all kinds of different entities. I want instead save just the most important fields (dates, reference IDs - kind of foreign key to various other tables, most important text fields etc.) and an additional text field where I'd like to store more complete object data. the most obvious solution would be to use XML strings and store those. The second most obvious choice would be JSON, that usually shorter and probably also faster to serialize/deserialize... And is probably also faster. But is it really? My objects also wouldn't need to be strictly serializable, because JsonSerializer is usually able to serialize anything. Even anonymous objects, that may as well be used here. What would be the most optimal solution to solve this problem? Additional info My DB is highly normalised and I'm using Entity Framework, but for the purpose of having external super-fast fulltext search functionality I'm sacrificing a bit DB denormalisation. Just for the info I'm using SphinxSE on top of MySql. Sphinx would return row IDs that I would use to fast query my index optimised conglomerate table to get most important data from it much much faster than querying multiple tables all over my DB. My table would have columns like: RowID (auto increment) EntityID (of the actual entity - but not directly related because this would have to point to different tables) EntityType (so I would be able to get the actual entity if needed) DateAdded (record timestamp when it's been added into this table) Title Metadata (serialized data related to particular entity type) This table would be indexed with SPHINX indexer. When I would search for data using this indexer I would provide a series of EntityIDs and a limit date. Indexer would have to return a very limited paged amount of RowIDs ordered by DateAdded (descending). I would then just join these RowIDs to my table and get relevant results. So this won't actually be full text search but a filtering search. Getting RowIDs would be very fast this way and getting results back from the table would be much faster than comparing EntityIDs and DateAdded comparisons even though they would be properly indexed.

    Read the article

  • Grouping search results with thinking_sphinx plugin for rails

    - by Shagymoe
    I can use the following to group results, but it only returns one result per group. @results = Model.search params[:search_query], :group_by => 'created_at', :group_function => :day, :page => params[:page], :per_page => 50 So, if I display the results by day, I only get one result per day. <% @results.each_with_groupby do |result, group| %> <div class="group"><%= group %></div> <ul class="result"> <li><%= result.name %></li> </ul> <% end %> Do I have to parse the @results array and group them by date manually or am I missing something? Here is the line from the sphinx docs: http://sphinxsearch.com/docs/current.html#clustering "The final search result set then contains one best match per group."

    Read the article

  • Search filenames in MySQL database table restricted by filetype?

    - by ju
    Hello I have a MySQL database that I replicate from another server. The database contains a table with this columns ID, FileName and FileSize In the table there are more than 4'000'000 records. I want to make fast a search in FileName (varchar) column I found that I can use for this Sphinx search engine. The problem is that I want to restrict searches by filetype. Do I have to and how (trigers?) to extract file extensions for all rows? May be I have to create another table (because this one is replicated) and join them in 1:1 relation? Can you give me some advices please :)

    Read the article

  • Multiple Refinement/Clusters in a single search

    - by Brain Teasers
    Suppose I have options of searching are 1.occupation 2.education 3.religion 4.caste 5.country and many more. when i perform this search, i can get easily calculate refinements. problem is i need to calculate refinements in a way that for individual refinement is calculated in a manner that all search parameter is considered expect the calculated refinement check on the left side ... even i have choose the profession area, still search refinements of this is coming.... same functionality if of all refinements.Please help.I use sphinx for searching seen this type of refinements in http://ww2.shaadi.com/search

    Read the article

  • Search by ID, no keyword. Tried using :conditions but no result ouput.

    - by Victor
    Using Thinking Sphinx, Rails 2.3.8. I don't have a keyword to search, but I wanna search by shop_id which is indexed. Here's my code: @country = Country.search '', { :with => {:shop_id => params[:shop_id]}, :group_by => 'trip_id', :group_function => :attr, :page => params[:page] } The one above works. But I thought the '' is rather redundant. So I replaced it with :conditions resulting as: @country = Country.search :conditions => { :with => {:shop_id => params[:shop_id]}, :group_by => 'trip_id', :group_function => :attr, :page => params[:page] } But then it gives 0 result. What is the correct code? Thanks.

    Read the article

  • Best full text search for mysql?

    - by ConroyP
    We're currently running MySQL on a LAMP stack and have been looking at implementing a more thorough, full-text search on our site. We've looked at MySQL's own freetext search, but it doesn't seem to cope well with large databases, which makes it far too slow for our needs. Our main requirements are: speed returning results simple updating of index In addition to the above, our "nice to have"s are: ideally not something that requires adding a module to MySQL plays nicely with PHP (majority of our dev work done using PHP) There seems to be quite a few healthy open-source projects to add fast, reliable full-text search to MySQL, so I'm basically looking for recommendations/suggestions on what you've found to be the most useful product out there, easiest to set up, etc. So far, the list of ones we've been starting to play around with are: Sphinx, C++ based, used by craigslist, thepiratebay Lucene, Java-based Apache project, powers zeoh.com and zoomf.com Solr, Java-based offshoot of Lucene, used to power searches on Digg, CNet & AOL Channels Are there any better ones out there that we haven't come across yet? Can you recommend / suggest against any of the options we've gathered so far? Thanks for your help! Update @Cletus suggested Google's Custom Search Engine. We recently trialled this on a couple of projects, and it's an almost-perfect fit for our needs. The problem is that entries on our site are updated quite regularly, and unfortunately the speed at which entries go in/get updated in Google's index was just too slow and erratic for us to rely on, even with the addition of sitemaps and requested crawl rate changes.

    Read the article

  • Pecl complies .so extensions for OSX built-in PHP and not MAMP

    - by Camsoft
    I've installed the sphinx binaries and libraries and am now trying to install the PECL sphinx module. My system is running OS X 10.6 with MAMP 1.8.2 installed. I try to install sphinx using the following command: sudo pecl install sphinx The PECL command outputs the following: running: phpize Configuring for: PHP Api Version: 20090626 Zend Module Api No: 20090626 Zend Extension Api No: 220090626 The versions above don't match the versions listed when doing a phpinfo(). It seems that PECL is trying to complie against the built-in version of PHP. If I ignore the errors and continue the it will successfully compile and place the sphinx.so file in: /usr/lib/php/extensions/no-debug-non-zts-20090626/sphinx.so when in fact it should be: /Applications/MAMP/bin/php5/lib/php/extensions/no-debug-non-zts-20060613/ I've tried copying the sphinx.so file to the MAMP extensions dir but when I restart apache PHP displays the following warning: PHP Startup: Unable to load dynamic library '/Applications/MAMP/bin/php5/lib/php/extensions/no-debug-non-zts-20060613/sphinx.so I think this is because MAMP is 32bit and the built-in PHP is 64bit so PECL complies for 64bit. I might be completely wrong but I did read this when I goggled on the topic. Does anyone know how to get PECL to map to the MAMP version of PHP instead of the built-in version?

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8  | Next Page >