Search Results

Search found 293 results on 12 pages for 'phrases'.

Page 1/12 | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Need a tool to search large structure text documents for words, phrases and related phrases

    - by pitosalas
    I have to keep up with structured documents containing things such as requests for proposals, government program reports, threat models and all kinds of things like that. They are in techno-legalese as I would call them: highly structured, with section numbering and 3, 4 and 5 levels of nesting. All in English I need a more efficient way to locate those paragraphs of nuggets that matter to me. So what I’d like is kind of a local document index/repository, that would allow me to have some standing queries and easily locate sections in documents that talk about my queries. Here’s an example: I’d like to load in 10 large PDF files, each of say 100 pages. Each PDF contains English text, formatted very nicely into paragraphs and sections. I’d like to specify that I am interested in “blogging platforms”, “weaknesses in Ruby”, “localization and internationalization” Ideally then look at a list that showed the section of text, the name of the document, and other information that seemed to be related to and/or include the words and phrases I specified. I am sure something like this exists. I would call it something like document indexing, document comprehension or structured searching.

    Read the article

  • categorize a set of phrases into a set of similar phrases

    - by Dingo
    I have a few apps that generate textual tracing information (logs) to log files. The tracing information is the typical printf() style - i.e. there are a lot of log entries that are similar (same format argument to printf), but differ where the format string had parameters. What would be an algorithm (url, books, articles, ...) that will allow me to analyze the log entries and categorize them into several bins/containers, where each bin has one associated format? Essentially, what I would like is to transform the raw log entries into (formatA, arg0 ... argN) instances, where formatA is shared among many log entries. The formatA does not have to be the exact format used to generate the entry (even more so if this makes the algo simpler). Most of the literature and web-info I found deals with exact matching, a max substring matching, or a k-difference (with k known/fixed ahead of time). Also, it focuses on matching a pair of (long) strings, or a single bin output (one match among all input). My case is somewhat different, since I have to discover what represents a (good-enough) match (generally a sequence of discontinuous strings), and then categorize each input entries to one of the discovered matches. Lastly, I'm not looking for a perfect algorithm, but something simple/easy to maintain. Thanks!

    Read the article

  • Free Content For Websites - Choosing Keyword Phrases to Insert

    The way to use keyword phrases in your free website content is to start by selecting a single keyword phrase that best describes what your website stands for. You should then focus on this keyword phrase and other related synonyms. Using the tool I usually use for this exercise, you can get dozens of similar keyword phrases.

    Read the article

  • Which is better for search engines, repeated phrases or different phrases with the same meaning?

    - by George Botros
    When I'm designing an ads website I have two options: Let the advertiser to choose from some predefined lists to create the new ad. For Example: product list ( T-Shirt, Shorts, Suit, .....) Color list ( Black, Red, .....) Let the advertiser to write his own descriptive content for the product For Example "Amazing suit with a good price" I like the first Scenario but which is better for search engine optimization [SEO], repeated phrases or different phrases with the same meaning? Note : assuming each page will contain one or more ads

    Read the article

  • Emoti-phrases

    - by Tony Davis
    Surely the next radical step in the development of User-interface design is for applications to react appropriately to the rising tide of bewilderment, frustration or antipathy of the users. When an application understands that rapport is lost, it should respond accordingly. When we, for example, become confused by an unforgiving interface, shouldn't there be a way of signalling our bewilderment and having the application respond appropriately? There is surprisingly little in the current interface standards that would help. If we're getting frustrated with an unresponsive application, perhaps we could let it know of our increasing irritation by means of an "I'm getting angry and exasperated" slider. Although, by 'responding appropriately', I don't include playing a "we are experiencing unusually heavy traffic: your application usage is important to us" message, accompanied by calming muzak. When confronted with a tide of wizards, 'are you sure?' messages, or page-after-page of tiresome and barely-relevant options, how one yearns for a handy 'JFDI' (Just Flaming Do It) button. One click and the application miraculously desists with its annoying questions and just gets on with the job, using the defaults, or whatever we selected last time. Much more satisfying, and more direct to most developers and DBAs, however, would be the facility to communicate to the application via a twitter-style input field, or via parameters to command-line applications ("I don't want a wide-ranging debate with you; just open the bl**dy PDF!" or, or "Don't forget which of us has the close button"). Although to avoid too much cultural-dependence, perhaps we should take the lead from emoticons, and use a set of standardized emoti-phrases such as 'sez you', 'huh?', 'Pshaw!', or 'meh', which could be used to vent a range of feelings in any given application, whether it be SQL Server stubbornly refusing to give us the result we are expecting, or when an online-survey is getting too personal. Or a 'Lingua Glaswegia' perhaps: 'Atsabelter' ("very good") 'Atspish' ("must try harder") 'AnThenYerArsFellAff ' ("I don't quite trust these results") 'BileYerHeid' or 'ShutYerGub' ("please stop these inane questions") There would, of course, have to be an ANSI standards body to define the phrases that were acceptable. Presumably, there would be a tussle amongst the different international standards organizations. Meanwhile Oracle, Microsoft and Apple would each release non-standard extensions. Time then, surely, to plant emoti-phrases on the lot of them and develop a user-driven consensus. Send us your suggestions! The best one will win an iPod nano! Cheers! Tony.

    Read the article

  • Comparison of phrases containing the same word in Google Trends

    - by alisia123
    If I compare three phrases in google trends : house sale house white house I get the following numbers: house - 91 sale house - 3 white house - 2 The question is: Is "sale house" and "white house" already included in the number 91? It is an important question, because if it is true, than: house_except_sale_house + sale_house = 91 sale_house = 3 Which means I have to compare 88 and 3, if I compare "house" and "sale house"

    Read the article

  • How to sift idioms and set phrases apart from other common phrases using NLP techniques?

    - by hippietrail
    What techniques exist that can tell the difference betwen plain common phrases such as "to the", "and the" and set phrases and idioms which have their own lexical meanings such as "pick up", "fall in love", "red herring", "dead end"? Are there techniques which are successful even without a dictionary, statistical methods HMMs train on large corpora for instance? Or are there heuristics such as ignoring or weighting down "promiscuous" words which can co-occur with just about any word versus words which occur either alone or in a specific limited set of idiomatic phrases? If there are such heuristics, how do we take into account set phrases and verbal phrases which do incorporate promiscuous words such as "up" in "beat up", "eat up", "sit up", "think up"? UPDATE I've found an interesting paper online: Unsupervised Type and Token Identi?cation of Idiomatic Expressions

    Read the article

  • Funny statements, quotes, phrases, errors found on technical Books [closed]

    - by Felipe Fiali
    I found some funny or redundant statements on technical books I've read, I'd like to share. And I mean good, serious, technical books. Ok so starting it all: The .NET framework doesn't support teleportation From MCTS 70-536 Training kit book - .NET Framework 2.0 Application Development Foundation Teleportation in science fiction is a good example of serialization (though teleportation is not currently supported byt the .NET Framework). C# 3 is sexy From Jon Skeet's C# in Depth second Edition You may be itching to get on to the sexy stuff from C# 3 by this point, and I don’t blame you. Instantiating a class From Introduction to development II in Microsoft Dynamics AX 2009 To instantiate a class, is to create a new instance of it. Continue or break From Introduction to development II in Microsoft Dynamics AX 2009 Continue and break commands are used within all three loops to tell the execution to break or continue. These are just a few. I'll post some more later. Share some that you might have found too.

    Read the article

  • How does Amazon's Statistically Improbable Phrases work?

    - by ??iu
    How does something like Statistically Improbable Phrases work? According to amazon: Amazon.com's Statistically Improbable Phrases, or "SIPs", are the most distinctive phrases in the text of books in the Search Inside!™ program. To identify SIPs, our computers scan the text of all books in the Search Inside! program. If they find a phrase that occurs a large number of times in a particular book relative to all Search Inside! books, that phrase is a SIP in that book. SIPs are not necessarily improbable within a particular book, but they are improbable relative to all books in Search Inside!. For example, most SIPs for a book on taxes are tax related. But because we display SIPs in order of their improbability score, the first SIPs will be on tax topics that this book mentions more often than other tax books. For works of fiction, SIPs tend to be distinctive word combinations that often hint at important plot elements. For instance, for Joel's first book, the SIPs are: leaky abstractions, antialiased text, own dog food, bug count, daily builds, bug database, software schedules One interesting complication is that these are phrases of either 2 or 3 words. This makes things a little more interesting because these phrases can overlap with or contain each other.

    Read the article

  • Finding Common Phrases in SQL Server TEXT Column

    - by regex
    Short Desc: I'm curious to see if I can use SQL Analysis services or some other SQL Server service to mine some data for me that will show commonalities between SQL TEXT fields in a dataset. Long Desc I am looking at a subset of data that consists of about 10,000 rows of TEXT blobs which are used as a notes column in a issue tracking (ticketing) software. I would like to use something out of the box (without having to build something) that might be able to parse through all of the rows and find commonly used byte sequences in the "Notes" column. In other words, I want to find commonly used phrases (two to three word phrases, so 9 - 20 character sections of the TEXT blob). This will help me better determine if associate's notes contain similar phrases (troubleshooting techniques) that we could standardize in our troubleshooting process flow. Closing Note I'd really rather not build an application to do this as my method will probably not be the most efficient way to do it. Hopefully all this makes sense. Please let me know in the comments if anything needs clarification.

    Read the article

  • Finding Common Phrases in MS SQL TEXT Column

    - by regex
    Hello All, Short Desc: I'm curious to see if I can use SQL Analysis services or some other MS SQL service to mine some data for me that will show commonalities between SQL TEXT fields in a dataset. Long Desc I am looking at a subset of data that consists of about 10,000 rows of TEXT blobs which are used as a notes column in a issue tracking (ticketing) software. I would like to use something out of the box (without having to build something) that might be able to parse through all of the rows and find commonly used byte sequences in the "Notes" column. In other words, I want to find commonly used phrases (two to three word phrases, so 9 - 20 character sections of the TEXT blob). This will help me better determine if associate's notes contain similar phrases (troubleshooting techniques) that we could standardize in our troubleshooting process flow. Closing Note I'd really rather not build an application to do this as my method will probably not be the most efficient way to do it. Hopefully all this makes sense. Please let me know in the comments if anything needs clarification. Thanks in advance for your help.

    Read the article

  • mine phrases (up to 3 words) from a given text

    - by DS_web_developer
    I asked before for a simple solution to my problem (using sphinx search service) but I got nowhere... someone has kindly provided me with this code <?php /** * $Project: GeoGraph $ * $Id$ * * GeoGraph geographic photo archive project * This file copyright (C) 2005 Barry Hunter ([email protected]) * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version 2 * of the License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ /** * Provides the methods for updating the worknet tables * * @package Geograph * @author Barry Hunter <[email protected]> * @version $Revision$ */ function addTwoLetterPhrase($phrase) { global $w2; $w2[$phrase] = (isset($w2[$phrase]))?($w2[$phrase]+1):1; } function addThreeLetterPhrase($phrase) { global $w3; $w3[$phrase] = (isset($w3[$phrase]))?($w3[$phrase]+1):1; } function updateWordnet(&$db,$text,$field,$id) { global $w1,$w2,$w3; $alltext = strtolower(preg_replace('/\W+/',' ',str_replace("'",'',$text))); if (strlen($text)< 1) return; $words = preg_split('/ /',$alltext); $w1 = array(); $w2 = array(); $w3 = array(); //build a list of one word phrases foreach ($words as $word) { $w1[$word] = (isset($w1[$word]))?($w1[$word]+1):1; } //build a list of two word phrases $text = $alltext; $text = preg_replace('/(\w+) (\w+)/e','addTwoLetterPhrase("$1 $2")',$text); $text = $alltext; $text = preg_replace('/(\w+)/','',$text,1); $text = preg_replace('/(\w+) (\w+)/e','addTwoLetterPhrase("$1 $2")',$text); //build a list of three word phrases $text = $alltext; $text = preg_replace('/(\w+) (\w+) (\w+)/e','addThreeLetterPhrase("$1 $2 $3")',$text); $text = $alltext; $text = preg_replace('/(\w+)/','',$text,1); $text = preg_replace('/(\w+) (\w+) (\w+)/e','addThreeLetterPhrase("$1 $2 $3")',$text); $text = $alltext; $text = preg_replace('/(\w+) (\w+)/','',$text,1); $text = preg_replace('/(\w+) (\w+) (\w+)/e','addThreeLetterPhrase("$1 $2 $3")',$text); foreach ($w1 as $word=>$count) { $db->Execute("insert into wordnet1 set gid = $id,words = '$word',$field = $count");// ON DUPLICATE KEY UPDATE $field=$field+$count"); } foreach ($w2 as $word=>$count) { $db->Execute("insert into wordnet2 set gid = $id,words = '$word',$field = $count"); } foreach ($w3 as $word=>$count) { $db->Execute("insert into wordnet3 set gid = $id,words = '$word',$field = $count"); } } ?> It works fine and does almost exactly what I need....... except.... it is not utf8 friendly... I mean... it splits whole words into parts (on special chars) where it shouldn't! so my guess is I should use multibyte functions instead of regular preg_replace... I tried to replace preg_replace with mb_ereg_replace but it is not working as it should... at least not for 2 and 3 words phrases any ideas?

    Read the article

  • Delphi Phrase Count / Keyword Density

    - by Brad
    Does anyone know how to or have some code on counting the number of unique phrases in a document? (Single word, two word phrases, three word phrases). Thanks Example of what I'm looking for: What I mean is I have a text document, and i need to see what the most popular word phrases are. Example text I took the car to the car wash. I : 1 took : 1 the : 2 car: 2 to : 1 wash : 1 I took : 1 took the : 1 the car : 2 car to : 1 to the : 1 car wash : 1 I took the : 1 took the car : 1 the car to : 1 car to the : 1 to the car : 1 the car wash : 1 I took the car to : 1 took the car to the : 1 the car to the car : 1 car to the car wash : 1 I need the phrase, and the count that it shows up. Any help would be appreciated. The closet thing I found to this was a PHP script from http://tools.seobook.com/general/keyword-density/source.php I used to have some code for this, but I cannot find it.

    Read the article

  • Delphi Phrase Count

    - by Brad
    Does anyone know how to or have some code on counting the number of unique phrases in a document? (Single word, two word phrases, three word phrases). Thanks

    Read the article

  • Is there a phrase or word to describe an algorithim or programme is complete in that given any value for its arguments there is a predictable outcome?

    - by Mrk Mnl
    Is there a phrase to describe an algorithim or programme is complete in that given any possible value for its arguments there is a predicatable outcome? i.e. all the ramifications have been considered whatever the context? A simple example would be the below function: function returns string get_item_type(int type_no) { if(type_no < 10) return "hockey stick" else if (type_no < 20) return "bulldozer" else return "unknown" } (excuse the dismal pseudo code) No matter what number is supplied all possibiblites are catered for. My question is: is there a word to fill the blank here: "get_item_type() is ______ complete" ? (The answer is not Turing Complete - that is something quite different - but I annoyingly always think of something as "Turing Complete" when I am thinking of the above).

    Read the article

  • Is there a phrase or word to describe an algorithim or program is complete in that given any value for its arguments there is a defined outcome?

    - by Mrk Mnl
    Is there a phrase or word to describe an algorithim or programme is complete in that given any value for its arguments there is a defined outcome? i.e. all the ramifications have been considered whatever the context? A simple example would be the below function: function returns string get_item_type(int type_no) { if(type_no < 10) return "hockey stick" else if (type_no < 20) return "bulldozer" else return "unknown" } (excuse the dismal pseudo code) No matter what number is supplied all possibiblites are catered for. My question is: is there a word to fill the blank here: "get_item_type() is ______ complete" ? (The answer is not Turing Complete - that is something quite different - but I annoyingly always think of something as "Turing Complete" when I am thinking of the above).

    Read the article

  • Check keyword popularity of 2000 phrases?

    - by Mark
    I just found a list of about 2000 car manufacturers which I want to put into a drop-down list... but 2000 is probably a bit too many, so I want to filter it down to maybe the top 100 most popular cars. I figure I can just use Google search popularity to give me a rough estimate of how popular the car is... but I can't find a tool that will let me query 2000 keywords. Anyone know of one?

    Read the article

  • Lucene.Net support phrases?: What is best approach to tokenize comma-delimited data (atomically) in

    - by Pete Alvin
    I have a database with a column I wish to index that has comma-delimited names, e.g., User.FullNameList = "Helen Ready, Phil Collins, Brad Paisley" I prefer to tokenize each name atomically (name as a whole searchable entity). What is the best approach for this? Did I miss a simple option to set the tokenize delimiter? Do I have to subclass or write my own class that to roll my own tokenizer? Something else? ;) Or does Lucene.net not support phrases? Or is it smart enough to handle this use case automatically? I'm sure I'm not the first person to have to do this. Googling produced no noticeable solutions.

    Read the article

  • How to extract common / significant phrases from a series of text entries

    - by arronsky
    I have a series of text items- raw HTML from a MYSQL database. I want to find the most common phrases in these entries (not the single most common phrase, and ideally, not enforcing word-for-word matching). My example is any review on Yelp.com, that shows 3 snippets from hundreds of reviews of a given restaurant, in the format: "Try the hamburger" (in 44 reviews) e.g., the "Review Highlights" section of this page: http://www.yelp.com/biz/sushi-gen-los-angeles/ I have NLTK installed and I've played around with it a bit, but am honestly overwhelmed by the options. This seems like a rather common problem and I haven't been able to find a straightforward solution by searching here. Thanks in advance for any help.

    Read the article

  • Where can I find a list of English phrases?

    - by Marcus Adams
    I'm tasked with searching for the use of cliches and common phrases in text. The phrases are similar to the phrases you might see for the phrase puzzles on Wheel of Fortune. Here are a few examples: Safety First Too Good To be True Winning Isn't Everything I cannot find a list of phrases however. Does anybody know of such a list? Seriously, even a list of all Wheel of Fortune solutions would suffice.

    Read the article

1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >