fulltext - Page 3 - Developer IT

Where can I find a list of 'Stop' words for Oracle fulltext search?

- by Tyronomo

I've a client testing the full text (example below) search on a new Oracle UCM site. The random text string they chose to test was 'test only'. Which failed; from my testing it seems 'only' is a reserved word, as it is never returned from a full text search (it is returned from metadata searches). I've spent the morning searching oracle.com and found this which seems pretty comprehensive, yet does not have 'only'. So my question is thus, is 'only' a reserved word. Where can I find a complete list of reserved words for Oracle full text search (10g)? Full text search string example; (<ftx>test only</ftx>) Update. I have done some more testing. Seems it ignores words that indicate places or times; only, some, until, when, while, where, there, here, near, that, who, about, this, them. Can anyone confirm this? I can't find this in on Oracle anywhere. Update 2. Post Answer I should have been looking for 'stop' words not 'reserved'. Updated the question title and tags to reflect.

Read the article

Where can I find a list of reserved words for Oracle fulltext search?

- by Tyronomo

I've a client testing the full text (example below) search on a new Oracle UCM site. The random text string they chose to test was 'test only'. Which failed; from my testing it seems 'only' is a reserved word, as it is never returned from a full text search (it is returned from metadata searches). I've spent the morning searching oracle.com and found this which seems pretty comprehensive, yet does not have 'only'. So my question is thus, is 'only' a reserved word. Where can I find a complete list of reserved words for Oracle full text search (10g)? Full text search string example; (<ftx>test only</ftx>)

Read the article

PERL XPath Parser Help

- by cognvision

I want to pull in data using a XML::XPath parser from a XML DB file from the Worldbank site. The problem is that I'm not seeing any results in the output. I must be missing something in the code. Ideally, I would like to extract just the death rate statistics from each country XML DB (year and value). I'm using this as part of my input: http://data.worldbank.org/sites/default/files/countries/en/afghanistan_en.xml use strict; use LWP 5.64; use HTML::ContentExtractor; use XML::XPath; my $agent1 = LWP::UserAgent->new; my $extractor = HTML::ContentExtractor->new(); #Retrieve main Worldbank country site my $mainlink = "http://data.worldbank.org/country/"; my $page = $agent1->get("$mainlink"); my $fulltext = $page->decoded_content(); #Match to just all available countries in Worldbank my $country = ""; my @countryList; if (@countryList = $fulltext =~ m/(http:\/\/data\.worldbank\.org\/country\/.*?")/gi){ foreach $country(@countryList){ #Remove " at the end of link $country=~s/\"//gi; print "\n" . $country; #Retrieve each country profile's XML DB file my $page = $agent1->get("$country"); my $fulltext = $page->decoded_content(); my $XML_DB = ""; my @countryXMLDBList; if (@countryXMLDBList = $fulltext =~ m/(http:\/\/data\.worldbank\.org\/sites\/default\/files\/countries\/en\/.*?\.xml)/gi){ foreach $XML_DB(@countryXMLDBList){ my $page = $agent1->get("$XML_DB"); my $fulltext = $page->decoded_content(); #print $fulltext; #Use XML XPath parser to find elements related to death rate my $xp = XML::XPath->new($fulltext); #my $xp = XML::XPath->new("afghanistan_en.xml"); my $nodeSet = $xp->find("//*"); if (!$nodeSet->isa('XML::XPath::NodeSet') || $nodeSet->size() == 0) { #No match found print "\nMatch not found!"; exit; } else { foreach my $node ($nodeSet->get_nodelist){ print "\n" . $node->find('country')->string_value; print "\n" . $node->find('indicator')->string_value; print "\n" . $node->find('year')->string_value; print "\n" . $node->find('value')->string_value; exit; } } } #Build line graph based on death rate statistics and output some image file format } } } I am also looking into using the xpath expression "following-sibling", but not sure how to use it correctly. For example, I have the following set of XML data where I am only interested in pulling siblings directly after the indicator for just death rate data. <data> <country id="AFG">Afghanistan</country> <indicator id="SP.DYN.CDRT.IN">Death rate, crude (per 1,000 people)</indicator> <year>2006</year> <value>20.3410000</value> </data> - <data> <country id="AFG">Afghanistan</country> <indicator id="SP.DYN.CDRT.IN">Death rate, crude (per 1,000 people)</indicator> <year>2007</year> <value>19.9480000</value> </data> - <data> <country id="AFG">Afghanistan</country> <indicator id="SP.DYN.CDRT.IN">Death rate, crude (per 1,000 people)</indicator> <year>2008</year> <value>19.5720000</value> </data> - <data> <country id="AFG">Afghanistan</country> <indicator id="IC.EXP.DOCS">Documents to export (number)</indicator> <year>2005</year> <value>7.0000000</value> </data> - <data> <country id="AFG">Afghanistan</country> <indicator id="IC.EXP.DOCS">Documents to export (number)</indicator> <year>2006</year> <value>12.0000000</value> </data> - <data> <country id="AFG">Afghanistan</country> <indicator id="IC.EXP.DOCS">Documents to export (number)</indicator> <year>2007</year> <value>12.0000000</value> </data> Any help would be much appreciated!!!

Read the article

Why my http POST request doesn't go well?

- by 0x90

I am trying to make this POST request in ruby. but get back #<Net::HTTPUnsupportedMediaType:0x007f94d396bb98> what I tried is: require 'rubygems' require 'net/http' require 'uri' require 'json' auto_index_nodes =URI('http://localhost:7474/db/data/index/node/') request_nodes = Net::HTTP::Post.new(auto_index_nodes.request_uri) http = Net::HTTP.new(auto_index_nodes.host, auto_index_nodes.port) request_nodes.add_field("Accept", "application/json") request_nodes.set_form_data({"name"=>"node_auto_index", "config" => { "type" => "fulltext", "provider" =>"lucene"} , "Content-Type" => "application/json" }) response = http.request(request_nodes) Tried to write this part: "config" => { "type" => "fulltext", provider" =>"lucene"} , "Content-Type" => "application/json" } like that: "config" => '{ "type" => "fulltext",\ "provider" =>"lucene"},\ "Content-Type" => "application/json"\ }' this try didn't help either: request_nodes.set_form_data({"name"=>"node_auto_index", "config" => '{ \ "type" : "fulltext",\ "provider" : "lucene"}' , "Content-Type" => "application/json" })

Read the article

SQL Server Full Text Search resource consumption

- by Sam Saffron

When SQL Server builds a fulltext index computer resources are consumed (IO/Memory/CPU) Similarly when you perform full text searches, resources are consumed. How can I get a gauge over a 24 hour period of the exact amount of CPU and IO(reads/writes) that fulltext is responsible for, in relation to global SQL Server resource usage. Are there any perfmon counters, DMVs or profiler traces I can use to help answer this question?

Read the article

CDATA xml parsing extra greater than problem

- by Ruchir Shah

Hi, I am creating an xml using php and parsing that xml in iphone application code. In description field there is some html tags and text. I am using following line to convert this html tags in to xml tag using CDATA. $response .= '<desc><![CDATA['.trim($feed['fulltext']).']]></desc>'; Now, here my $feed['fulltext'] value is like this ...text... In xml I am getting following response, <desc><![CDATA[>...text...]]></desc> You can see here, I am getting an extra greater-than symbol just before the value of $feed['fulltext'] starts. (like this: ...text...) Any solution or suggestion for this? Thanks in advance. Cheers.

Read the article

May I use full-texted feed for my News Reader app?

- by Mahdi Ghiasi

I'm creating a news reader app. (Users will choose sources, but not the exact feed addresses. for example, they can choose Gizmodo but they will not see the exact url of Gizmodo's feed) Some of sources are giving partial feeds, and user must go to their website to read the complete article. My question is, May I make a fulltext version of their feed (for example, using this service) and use it instead of their feed in my News reader categories? Am I legally/ethically allowed to do this? What about using some other service like Readability? (Readability gives HTML page instead of feed as input, but anyway, output will be clean article without any advertises or such) Please note that some of these partial feeds may have some advertises inside, but the fulltext feed generators usually remove that advertises.

Read the article

define mysql indexing

- by Bharanikumar

Hi Am not sure, This is the right place to post this question , But in our stack overflow only am getting clear vision solutions , What is indexing and what is fulltext , for the above both questions i know the ans, but i cant expose that ans in the exact way to the interviewer , (indexing means somthing like index in book) (fulltext means for search string), Can please give me very simple defination for this questions , Advance thanks

Read the article

Sphinx Mysql Tutorial Help

- by Frederico

I have just implemented the Sphinx Storage Engine, and created my first Sphinx table, I'm trying to convert from using my Fulltext MyISAM table, so I figured I would just dump all the data into a sphinx table.. obviously I was wrong. Are there any great tutorials that really take you through transforming a fulltext search on MySQL to a sphinx table? Thanks in advance.

Read the article

Efficient method of finding database rows that have one or more qualities from a list of seven qualities

- by hithere

Hello! For this question, I'm looking to see if anyone has a better idea of how to implement what I'm currently planning on implementing (below): I'm keeping track of a set of images, using a database. Each image is represented by one row. I want to be able to search for images, using a number of different search parameters. One of these parameters involves a search-by-color option. (The rest of the search stuff is currently working fine.) Images in this database can contain up to seven colors: -Red -Orange -Yellow -Green -Blue -Indigo -Violet Here are some example user queries: "I want an image that contains red." "I want an image that contains red and blue." "I want an image that contains yellow and violet." "I want an image that contains red, orange, yellow, green, blue, indigo and violet." And so on. Users make this selection through the use of checkboxes in an html form. They can check zero checkboxes, all seven, and anything in between. I'm curious to hear what people think would be the most efficient way to perform this database search. I have two possible options right now, but I feel like there must be something better that I'm not thinking of. (Option 1) -For each row, simply have seven additional fields in the database, one for each color. Each field holds a 1 or 0 (true/false) value, and I SELECT based on whatever the user has checked off. (I didn't like this solution so much, because it seemed kind of wasteful to add seven additional fields...especially since most pictures in this table will only have 3-4 colors max, though some could have up to 7. So that means I'm storing a lot of zeros.) Also, if I added more searchable colors later on (which I don't think I will, but it's always possible), I'd have to add more fields. (Option 2) -For each image row, I could have a "colors" text field that stores space-separated color names (or numbers for the sake of compactness). Then I could do a fulltext match against search through the fields, selecting rows that contain "red yellow green" (or "1 3 4"). But I kind of didn't want to do fulltext searching because I already allow a keyword search, and I didn't really want to do two fulltext searches per image search. Plus, if the database gets big, fulltext stuff might slow down. Any better options that I didn't think of? Thanks! Side Note: I'm using PHP to work with a MySQL database.

Read the article

Full-text Indexing Books Online

- by Most Valuable Yak (Rob Volk)

While preparing for a recent SQL Saturday presentation, I was struck by a crazy idea (shocking, I know): Could someone import the content of SQL Server Books Online into a database and apply full-text indexing to it? The answer is yes, and it's really quite easy to do. The first step is finding the installed help files. If you have SQL Server 2012, BOL is installed under the Microsoft Help Library. You can find the install location by opening SQL Server Books Online and clicking the gear icon for the Help Library Manager. When the new window pops up click the Settings link, you'll get the following: You'll see the path under Library Location. Once you navigate to that path you'll have to drill down a little further, to C:\ProgramData\Microsoft\HelpLibrary\content\Microsoft\store. This is where the help file content is kept if you downloaded it for offline use. Depending on which products you've downloaded help for, you may see a few hundred files. Fortunately they're named well and you can easily find the "SQL_Server_Denali_Books_Online_" files. We are interested in the .MSHC files only, and can skip the Installation and Developer Reference files. Despite the .MHSC extension, these files are compressed with the standard Zip format, so your favorite archive utility (WinZip, 7Zip, WinRar, etc.) can open them. When you do, you'll see a few thousand files in the archive. We are only interested in the .htm files, but there's no harm in extracting all of them to a folder. 7zip provides a command-line utility and the following will extract to a D:\SQLHelp folder previously created: 7z e –oD:\SQLHelp "C:\ProgramData\Microsoft\HelpLibrary\content\Microsoft\store\SQL_Server_Denali_Books_Online_B780_SQL_110_en-us_1.2.mshc" *.htm Well that's great Rob, but how do I put all those files into a full-text index? I'll tell you in a second, but first we have to set up a few things on the database side. I'll be using a database named Explore (you can certainly change that) and the following setup is a fragment of the script I used in my presentation: USE Explore; GO CREATE SCHEMA help AUTHORIZATION dbo; GO -- Create default fulltext catalog for later FT indexes CREATE FULLTEXT CATALOG FTC AS DEFAULT; GO CREATE TABLE help.files(file_id int not null IDENTITY(1,1) CONSTRAINT PK_help_files PRIMARY KEY, path varchar(256) not null CONSTRAINT UNQ_help_files_path UNIQUE, doc_type varchar(6) DEFAULT('.xml'), content varbinary(max) not null); CREATE FULLTEXT INDEX ON help.files(content TYPE COLUMN doc_type LANGUAGE 1033) KEY INDEX PK_help_files; This will give you a table, default full-text catalog, and full-text index on that table for the content you're going to insert. I'll be using the command line again for this, it's the easiest method I know: for %a in (D:\SQLHelp\*.htm) do sqlcmd -S. -E -d Explore -Q"set nocount on;insert help.files(path,content) select '%a', cast(c as varbinary(max)) from openrowset(bulk '%a', SINGLE_CLOB) as c(c)" You'll need to copy and run that as one line in a command prompt. I'll explain what this does while you run it and watch several thousand files get imported: The "for" command allows you to loop over a collection of items. In this case we want all the .htm files in the D:\SQLHelp folder. For each file it finds, it will assign the full path and file name to the %a variable. In the "do" clause, we'll specify another command to be run for each iteration of the loop. I make a call to "sqlcmd" in order to run a SQL statement. I pass in the name of the server (-S.), where "." represents the local default instance. I specify -d Explore as the database, and -E for trusted connection. I then use -Q to run a query that I enclose in double quotes. The query uses OPENROWSET(BULK…SINGLE_CLOB) to open the file as a data source, and to treat it as a single character large object. In order for full-text indexing to work properly, I have to convert the text content to varbinary. I then INSERT these contents along with the full path of the file into the help.files table created earlier. This process continues for each file in the folder, creating one new row in the table. And that's it! 5 SQL Statements and 2 command line statements to unzip and import SQL Server Books Online! In case you're wondering why I didn't use FILESTREAM or FILETABLE, it's simply because I haven't learned them…yet. I may return to this blog after I figure that out and update it with the steps to do so. I believe that will make it even easier. In the spirit of exploration, I'll leave you to work on some fulltext queries of this content. I also recommend playing around with the sys.dm_fts_xxxx DMVs (I particularly like sys.dm_fts_index_keywords, it's pretty interesting). There are additional example queries in the download material for my presentation linked above. Many thanks to Kevin Boles (t) for his advice on (re)checking the content of the help files. Don't let that .htm extension fool you! The 2012 help files are actually XML, and you'd need to specify '.xml' in your document type column in order to extract the full-text keywords. (You probably noticed this in the default definition for the doc_type column.) You can query sys.fulltext_document_types to get a complete list of the types that can be full-text indexed. I also need to thank Hilary Cotter for giving me the original idea. I believe he used MSDN content in a full-text index for an article from waaaaaaaaaaay back, that I can't find now, and had forgotten about until just a few days ago. He is also co-author of Pro Full-Text Search in SQL Server 2008, which I highly recommend. He also has some FTS articles on Simple Talk: http://www.simple-talk.com/sql/learn-sql-server/sql-server-full-text-search-language-features/ http://www.simple-talk.com/sql/learn-sql-server/sql-server-full-text-search-language-features,-part-2/

Read the article

SQL SERVER – Get All the Information of Database using sys.databases

- by pinaldave

Earlier I wrote blog article SQL SERVER – Finding Last Backup Time for All Database. In the response of this article I have received very interesting script from SQL Server Expert Matteo as a comment in the blog. He has written script using sys.databases which provides plenty of the information about database. I suggest you can run this on your database and know unknown of your databases as well. SELECT database_id, CONVERT(VARCHAR(25), DB.name) AS dbName, CONVERT(VARCHAR(10), DATABASEPROPERTYEX(name, 'status')) AS [Status], state_desc, (SELECT COUNT(1) FROM sys.master_files WHERE DB_NAME(database_id) = DB.name AND type_desc = 'rows') AS DataFiles, (SELECT SUM((size*8)/1024) FROM sys.master_files WHERE DB_NAME(database_id) = DB.name AND type_desc = 'rows') AS [Data MB], (SELECT COUNT(1) FROM sys.master_files WHERE DB_NAME(database_id) = DB.name AND type_desc = 'log') AS LogFiles, (SELECT SUM((size*8)/1024) FROM sys.master_files WHERE DB_NAME(database_id) = DB.name AND type_desc = 'log') AS [Log MB], user_access_desc AS [User access], recovery_model_desc AS [Recovery model], CASE compatibility_level WHEN 60 THEN '60 (SQL Server 6.0)' WHEN 65 THEN '65 (SQL Server 6.5)' WHEN 70 THEN '70 (SQL Server 7.0)' WHEN 80 THEN '80 (SQL Server 2000)' WHEN 90 THEN '90 (SQL Server 2005)' WHEN 100 THEN '100 (SQL Server 2008)' END AS [compatibility level], CONVERT(VARCHAR(20), create_date, 103) + ' ' + CONVERT(VARCHAR(20), create_date, 108) AS [Creation date], -- last backup ISNULL((SELECT TOP 1 CASE TYPE WHEN 'D' THEN 'Full' WHEN 'I' THEN 'Differential' WHEN 'L' THEN 'Transaction log' END + ' – ' + LTRIM(ISNULL(STR(ABS(DATEDIFF(DAY, GETDATE(),Backup_finish_date))) + ' days ago', 'NEVER')) + ' – ' + CONVERT(VARCHAR(20), backup_start_date, 103) + ' ' + CONVERT(VARCHAR(20), backup_start_date, 108) + ' – ' + CONVERT(VARCHAR(20), backup_finish_date, 103) + ' ' + CONVERT(VARCHAR(20), backup_finish_date, 108) + ' (' + CAST(DATEDIFF(second, BK.backup_start_date, BK.backup_finish_date) AS VARCHAR(4)) + ' ' + 'seconds)' FROM msdb..backupset BK WHERE BK.database_name = DB.name ORDER BY backup_set_id DESC),'-') AS [Last backup], CASE WHEN is_fulltext_enabled = 1 THEN 'Fulltext enabled' ELSE '' END AS [fulltext], CASE WHEN is_auto_close_on = 1 THEN 'autoclose' ELSE '' END AS [autoclose], page_verify_option_desc AS [page verify option], CASE WHEN is_read_only = 1 THEN 'read only' ELSE '' END AS [read only], CASE WHEN is_auto_shrink_on = 1 THEN 'autoshrink' ELSE '' END AS [autoshrink], CASE WHEN is_auto_create_stats_on = 1 THEN 'auto create statistics' ELSE '' END AS [auto create statistics], CASE WHEN is_auto_update_stats_on = 1 THEN 'auto update statistics' ELSE '' END AS [auto update statistics], CASE WHEN is_in_standby = 1 THEN 'standby' ELSE '' END AS [standby], CASE WHEN is_cleanly_shutdown = 1 THEN 'cleanly shutdown' ELSE '' END AS [cleanly shutdown] FROM sys.databases DB ORDER BY dbName, [Last backup] DESC, NAME Please let me know if you find this information useful. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, Readers Contribution, SQL, SQL Authority, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

Read the article

Full-text indexing? You must read this

- by Kyle Hatlestad

For those of you who may have missed it, Peter Flies, Principal Technical Support Engineer for WebCenter Content, gave an excellent webcast on database searching and indexing in WebCenter Content. It's available for replay along with a download of the slidedeck. Look for the one titled 'WebCenter Content: Database Searching and Indexing'. One of the items he led with...and concluded with...was a recommendation on optimizing your search collection if you are using full-text searching with the Oracle database. This can greatly improve your search performance. And this would apply to both Oracle Text Search and DATABASE.FULLTEXT search methods. Peter describes how a collection can become fragmented over time as content is added, updated, and deleted. Just like you should defragment your hard drive from time to time to get your files placed on the disk in the most optimal way, you should do the same for the search collection. And optimizing the collection is just a simple procedure call that can be scheduled to be run automatically. beginctx_ddl.optimize_index('FT_IDCTEXT1','FULL', parallel_degree =>'1');end; When I checked my own test instance, I found my collection had a row fragmentation of about 80% After running the optimization procedure, it went down to 0% The knowledgebase article On Index Fragmentation and Optimization When Using OracleTextSearch or DATABASE.FULLTEXT [ID 1087777.1] goes into detail on how to check your current index fragmentation, how to run the procedure, and then how to schedule the procedure to run automatically. While the article mentions scheduling the job weekly, Peter says he now is recommending this be run daily, especially on more active systems. And just as a reminder, be sure to involve your DBA with your WebCenter Content implementation as you go to production and over time. We recently had a customer complain of slow performance of the application when it was discovered the database was starving for memory. So it's always helpful to keep a watchful eye on your database.

Read the article

Getting link to abstract indexed in Google Scholar

- by JordanReiter

We have a large digital library with thousands of papers indexed in Google Scholar. We allow Google Scholar to index our PDFs but they're blocked unless you have a subscription. So Google has full-text indexing/searching of our PDFs (great!) but then the links point just to those PDFs (boo!) instead of the more helpful abstract pages. Does anyone know what could cause an issue like this? I am, to the best of my knowledge, following all of the guidelines laid out in their Inclusion Guidelines. Here's some example meta data: <meta name="citation_title" content="Sample Title"/> <meta name="citation_author" content="LastName, FirstName"/> <meta name="citation_publication_date" content="2012/06/26"/> <meta name="citation_volume" content="1"/> <meta name="citation_issue" content="1"/> <meta name="citation_firstpage" content="10"/> <meta name="citation_lastpage" content="20"/> <meta name="citation_conference_title" content="Name of the Conference"/> <meta name="citation_isbn" content="1-234567-89-X"/> <meta name="citation_pdf_url" content="http://www.example.org/p/1234/proceeding_1234.pdf"/> <meta name="citation_fulltext_html_url" content="http://www.example.org/f/1234/"/> <meta name="citation_abstract_html_url" content="http://www.example.org/p/1234/"/> <link rel="canonical" href="http://www.example.org/p/1234/" /> example.org/p/1234 is the abstract page for the article; example.org/f/1234 is the fulltext link accessible to subscribers only (and to Google Scholar). example.org/p/1234/proceeding_1234.pdf is the fulltext PDF link.

Read the article

Search multiple tables

- by gilden

I have developed a web application that is used mainly for archiving all sorts of textual material (documents, references to articles, books, magazines etc.). There can be any given number of archive tables in my system, each with its own schema. The schema can be changed by a moderator through the application (imagine something similar to a really dumbed down version of phpMyAdmin). Users can search for anything from all of the tables. By using FULLTEXT indexes together with substring searching (fields which do not support FULLTEXT indexing) the script inserts the results of a search to a single table and by ordering these results by the similarity measure I can fairly easily return the paginated results. However, this approach has a few problems: substring searching can only count exact results the 50% rule applies to all tables separately and thus, mysql may not return important matches or too naively discards common words. is quite expensive in terms of query numbers and execution time (not an issue right now as there's not a lot of data yet in the tables). normalized data is not even searched for (I have different tables for categories, languages and file attatchments). My planned solution Create a single table having columns similar to id, table_id, row_id, data Every time a new row is created/modified/deleted in any of the data tables this central table also gets updated with the data column containing a concatenation of all the fields in a row. I could then create a single index for Sphinx and use it for doing searches instead. Are there any more efficient solutions or best practises how to approach this? Thanks.

Read the article

Problems when going from SQL 2005 to SQL 2008

- by Nezdet

Hi! I did go over from SQL server 2005 to 2008. Doing that gave me some problems with the fulltext search. This site is based on Fulltext search. It occurs more deadlocks, the search is slower and sometimes it return empty lists, don't know why. A lot of people has been writning about they having this problem with 2008. But I haven'tgot any solutions why 2005 worked better for my program.. PLS help me out!

Read the article

FORMSOF Thesaurus in SQL Server

- by Coolcoder

Has anyone done any performance measures with this in terms of speed where there is a high number of substitutes for any given word. For instance, I want to use this to store common misspellings; expecting to have 4-10 variations of a word. <expansion> administration administraton aministraton </expansion> When you run a fulltext search, how does performance degrade with that number of variations? for instance, I assume it has to do a separate fulltext search performing an OR? Also, having say 20/30K entries in the Thesaurus xml file - does this impact performance?

Read the article

Entity framework support for table valued functions and thus full text

- by simonsabin

One of my most popular posts with over 10, 000 hits is how to enable full text when using LINQ to SQL http://sqlblogcasts.com/blogs/simons/archive/2008/12/18/LINQ-to-SQL---Enabling-Fulltext-searching.aspx , core to this is the use of a table valued function. I’m therefore interested to see that Entity Framework will support table valued functions in the next release for more details have a read of the efdesign blog http://blogs.msdn.com/b/efdesign/archive/2011/01/21/table-valued-function-support...(read more)

Read the article

Adjusting the Score on Oracle Text search results

- by Kyle Hatlestad

When you sort the results of a search by Score using OracleTextSearch as the search engine in WebCenter Content, the results coming back are based on the relevancy on the document. In theory, the more relevant the search term is to the document, the higher ranked Score it should receive. But in practice, the relevancy score can seem somewhat of a mystery. It's not entirely clear how it ranks the importance of some documents over others based on the search term. And often times, once a word appears a certain number of times within a document, the Score simply maxes out at 100 and the top results can be difficult to discern from one another. Take for example the search for 'vacation' on this set of documents: Out of 7 results, 6 of them have a Score of '100' which means they are basically ranked the same. This doesn't make the sort by Score very meaningful. Besides sorting by relevance, you can also tell Oracle Text to sort by occurrence. In that case, it is a much more predictable result in how they would be ranked. And for many cases provide a more meaningful sorting of results then relevance. To change this takes a small component change to the SearchOperatorMap resource. By default, the query used for full-text searching looks like: <td>(ORACLETEXTSEARCH)fullText</td> <td>DEFINESCORE((%V), RELEVANCE * .1)</td> <td>text</td> Overriding this resource and changing it to: <td>(ORACLETEXTSEARCH)fullText</td> <td>DEFINESCORE((%V), OCCURRENCE * .01)</td> <td>text</td> will force it to now use occurrence (note the change in scale to .01 as well). So running the same search and sort options as the example above, the results come out quite a bit differently: In this case, there is a clear understanding of how the items rank. And generally, if the search term appears 3 times more in one document then another, it's got a better chance of being a document I'm interested in. You may or may not feel the relevance ranking is better then the search term occurrence, but this provides the opportunity to try an alternate method that might work better for your results. A pre-built component is available for download here. There is one caveat in using this method. The occurrence ranking also maxes out at 100, so if a search term is in the document more then that, the Score result will stay at 100.

Read the article

file layout and setuptools configuration for the python bit of a multi-language library

- by dan mackinlay

So we're writing a full-text search framework MongoDb. MongoDB is pretty much javascript-native, so we wrote the javascript library first, and it works. Now I'm trying to write a python framework for it, which will be partially in python, but partially use those same stored javascript functions - the javascript functions are an intrinsic part of the library. On the other hand, the javascript framework does not depend on python. since they are pretty intertwined it seems like it's worthwhile keeping them in the same repository. I'm trying to work out a way of structuring the whole project to give the javascript and python frameworks equal status (maybe a ruby driver or whatever in the future?), but still allow the python library to install nicely. Currently it looks like this: (simplified a little) javascript/jstest/test1.js javascript/mongo-fulltext/search.js javascript/mongo-fulltext/util.js python/docs/indext.rst python/tests/search_test.py python/tests/__init__.py python/mongofulltextsearch/__init__.py python/mongofulltextsearch/mongo_search.py python/mongofulltextsearch/util.py python/setup.py I've skipped out a few files for simplicity, but you get the general idea; it' a pretty much standard python project... except that it depends critcally ona whole bunch of javascript which is stored in a sibling directory tree. What's the preferred setup for dealing with this kind of thing when it comes to setuptools? I can work out how to use package_data etc to install data files that live inside my python project as per the setuptools docs. The problem is if i want to use setuptools to install stuff, including the javascript files from outside the python code tree, and then also access them in a consistent way when I'm developing the python code and when it is easy_installed to someone's site. Is that supported behaviour for setuptools? Should i be using paver or distutils2 or Distribute or something? (basic distutils is not an option; the whole reason I'm doing this is to enable requirements tracking) How should i be reading the contents of those files into python scripts?

Read the article

MySQL Full-Text Search Across Multiple Tables - Quick/Long Solution?

- by Kerry

Hello all, I have been doing a bit of research on full-text searches as we realized a series of LIKE statements are terrible. My first find was MySQL full-text searches. I tried to implement this and it worked on one table, failed when I was trying to join multiple tables, and so I consulted stackoverflow's articles (look at the end for a list of the ones I've been to) I didn't see anything that clearly answered my questions. I'm trying to get this done literally in an hour or two (quick solution) but I also want to do a better long term solution. Here is my query: SELECT a.`product_id`, a.`name`, a.`slug`, a.`description`, b.`list_price`, b.`price`, c.`image`, c.`swatch`, e.`name` AS industry FROM `products` AS a LEFT JOIN `website_products` AS b ON (a.`product_id` = b.`product_id`) LEFT JOIN ( SELECT `product_id`, `image`, `swatch` FROM `product_images` WHERE `sequence` = 0) AS c ON (a.`product_id` = c.`product_id`) LEFT JOIN `brands` AS d ON (a.`brand_id` = d.`brand_id`) INNER JOIN `industries` AS e ON (a.`industry_id` = e.`industry_id`) WHERE b.`website_id` = 96 AND b.`status` = 1 AND b.`active` = 1 AND MATCH( a.`name`, a.`sku`, a.`description`, d.`name` ) AGAINST ( 'ashley sofa' ) GROUP BY a.`product_id` ORDER BY b.`sequence` LIMIT 0, 9 The error I get is: Incorrect arguments to MATCH If I remove d.name from the MATCH statement it works. I have a full-text index on that column. I saw one of the articles say to use an OR MATCH for this table, but won't that lose the effectiveness of being able to rank them together or match them properly? Other places said to use UNIONs but I don't know how to do that properly. Any advice would be greatly appreciated. In the idea of a long term solution it seems that either Sphinx or Lucene is best. Now by no means and I a MySQL guru, and I heard that Lucene is a bit more complicated to setup, any recommendations or directions would be great. Articles: http://stackoverflow.com/questions/1117005/mysql-full-text-search-across-multiple-tables http://stackoverflow.com/questions/668371/mysql-fulltext-search-across-1-table http://stackoverflow.com/questions/2378366/mysql-how-to-make-multiple-table-fulltext-search http://stackoverflow.com/questions/737275/pros-cons-of-full-text-search-engine-lucene-sphinx-postgresql-full-text-searc http://stackoverflow.com/questions/1059253/searching-across-multiple-tables-best-practices

Read the article

PHP 'smart' search engine to search Mysql tables advice

- by Anonymous12345

I am creating a search engine for my php based website. I need to search a mysql table. Thing is, the search engine must be pretty 'smart', so that users can easily find their items (it's a classifieds website). I have currently set up a FULLTEXT search with this piece of code: MATCH (headline) AGAINST ($querystring) But this isn't enough... For instance, lets say the field headline contains something like Bmw 330ci. If I search for 330, I wont get any results. The ending ('ci') is just one of many endings in car models which must be taken into account when searching the table. Or what if the headline field is bmw330? Also no results, because it only matches full words. Or also, what if the headline is bmw 330, and I search for bmw 520, still with FULLTEXT I will get the bmw 330 as a result, even though I searched for bmw 520... Not good! How should I solve this problem?

Read the article

Lucene.Net: How can I add a date filter to my search results?

- by rockinthesixstring

I've got my searcher working really well, however it does tend to return results that are obsolete. My site is much like NerdDinner whereby events in the past become irrelevant. I'm currently indexing like this Public Function AddIndex(ByVal searchableEvent As [Event]) As Boolean Implements ILuceneService.AddIndex Dim writer As New IndexWriter(luceneDirectory, New StandardAnalyzer(), False) Dim doc As Document = New Document doc.Add(New Field("id", searchableEvent.ID, Field.Store.YES, Field.Index.UN_TOKENIZED)) doc.Add(New Field("fullText", FullTextBuilder(searchableEvent), Field.Store.YES, Field.Index.TOKENIZED)) doc.Add(New Field("user", If(searchableEvent.User.UserName = Nothing, "User" & searchableEvent.User.ID, searchableEvent.User.UserName), Field.Store.YES, Field.Index.TOKENIZED)) doc.Add(New Field("title", searchableEvent.Title, Field.Store.YES, Field.Index.TOKENIZED)) doc.Add(New Field("location", searchableEvent.Location.Name, Field.Store.YES, Field.Index.TOKENIZED)) doc.Add(New Field("date", searchableEvent.EventDate, Field.Store.YES, Field.Index.UN_TOKENIZED)) writer.AddDocument(doc) writer.Optimize() writer.Close() Return True End Function Notice how I have a "date" index that stores the event date. My search then looks like this ''# code omitted Dim reader As IndexReader = IndexReader.Open(luceneDirectory) Dim searcher As IndexSearcher = New IndexSearcher(reader) Dim parser As QueryParser = New QueryParser("fullText", New StandardAnalyzer()) Dim query As Query = parser.Parse(q.ToLower) ''# We're using 10,000 as the maximum number of results to return ''# because I have a feeling that we'll never reach that full amount ''# anyways. And if we do, who in their right mind is going to page ''# through all of the results? Dim topDocs As TopDocs = searcher.Search(query, Nothing, 10000) Dim doc As Document = Nothing ''# loop through the topDocs and grab the appropriate 10 results based ''# on the submitted page number While i <= last AndAlso i < topDocs.totalHits doc = searcher.Doc(topDocs.scoreDocs(i).doc) IDList.Add(doc.[Get]("id")) i += 1 End While ''# code omitted I did try the following, but it was to no avail (threw a NullReferenceException). While i <= last AndAlso i < topDocs.totalHits If Date.Parse(doc.[Get]("date")) >= Date.Today Then doc = searcher.Doc(topDocs.scoreDocs(i).doc) IDList.Add(doc.[Get]("id")) i += 1 End If End While I also found the following documentation, but I can't make heads or tails of it http://lucene.apache.org/java/1_4_3/api/org/apache/lucene/search/DateFilter.html

Read the article

How to order search results by multiple fields?

- by JustinRoR

I am using Sunspot and Will_paginate for search in my application and don't how to have my search results start out with certain ordering conditions. The model I am searching is the UserPrice model and want my :price and :purchase_date in descending order or lowest price to highest and present date to past: class UserPrice < ActiveRecord::Base attr_accessible :price, :product_name, :purchase_date belongs_to :product # Sunspot configuration searchable do text :product_name do product.name end end end class SearchController < ApplicationController def index @search = UserPrice.search do fulltext params[:search] paginate(:per_page => 5, :page => params[:page]) end @user_prices = @search.results end end Even though I don't know how, I'm not sure if I would use Sunspot or Will_paginate to sort by order of price and purchase date. How would I achieve this though? Thank you. UPDATE I try to use the order_by method but not sure how the model would look now. class SearchController < ApplicationController def index @search = UserPrice.search do fulltext params[:search] paginate(:per_page => 5, :page => params[:page]) facet(:business_retail_store_id) facet(:business_online_store_id) order_by :price, :desc order_by :purchase_date, :desc end @user_prices = @search.results end end Not sure why having the following in my controller: order_by :price, :desc order_by :purchase_date, :desc I get the error: Sunspot::UnrecognizedFieldError in SearchController#index No field configured for UserPrice with name 'price' This doesn't make sense to me since I do have these fields inside of my UserPrice model and in my database. How do I fix this?

Read the article

SQL Server 2005 FREETEXT() Perfomance Issue

- by Zenon

I have a query with about 6-7 joined tables and a FREETEXT() predicate on 6 columns of the base table in the where. Now, this query worked fine (in under 2 seconds) for the last year and practically remained unchanged (i tried old versions and the problem persists) So today, all of a sudden, the same query takes around 1-1.5 minutes. After checking the Execution Plan in SQL Server 2005, rebuilding the FULLTEXT Index of that table, reorganising the FULLTEXT index, creating the index from scratch, restarting the SQL Server Service, restarting the whole server I don't know what else to try. I temporarily switched the query to use LIKE instead until i figure this out (which takes about 6 seconds now). When I look at the query in the query performance analyser, when I compare the ´FREETEXT´query with the ´LIKE´ query, the former has 350 times as many reads (4921261 vs. 13943) and 20 times (38937 vs. 1938) the CPU usage of the latter. So it really is the ´FREETEXT´predicate that causes it to be so slow. Has anyone got any ideas on what the reason might be? Or further tests I could do?

Search Results

Search found 148 results on 6 pages for 'fulltext'.

Page 3/6 | < Previous Page | 1 2 3 4 5 6 | Next Page >

- by Tyronomo

- by Tyronomo

- by cognvision

- by 0x90

- by Sam Saffron

- by Ruchir Shah

- by Mahdi Ghiasi

- by Bharanikumar

- by Frederico

- by hithere

- by Most Valuable Yak (Rob Volk)

- by pinaldave

- by Kyle Hatlestad

- by JordanReiter

- by gilden

- by Nezdet

- by Coolcoder

- by simonsabin

- by Kyle Hatlestad

- by dan mackinlay

- by Kerry

- by Anonymous12345

- by rockinthesixstring

- by JustinRoR

- by Zenon

< Previous Page | 1 2 3 4 5 6 | Next Page >