Search Results

Search found 6107 results on 245 pages for 'reserved words'.

Page 24/245 | < Previous Page | 20 21 22 23 24 25 26 27 28 29 30 31 | Next Page >

How to count the Chinese word in a file using regex in perl?

- by Ivan

I tried following perl code to count the Chinese word of a file, it seems working but not get the right thing. Any help is greatly appreciated. The Error message is Use of uninitialized value $valid in concatenation (.) or string at word_counting.pl line 21, <FILE> line 21. Total things = 125, valid words = which seems to me the problem is the file format. The "total thing" is 125 that is the string number (125 lines). The strangest part is my console displayed all the individual Chinese words correctly without any problem. The utf-8 pragma is installed. #!/usr/bin/perl -w use strict; use utf8; use Encode qw(encode); use Encode::HanExtra; my $input_file = "sample_file.txt"; my ($total, $valid); my %count; open (FILE, "< $input_file") or die "Can't open $input_file: $!"; while (<FILE>) { foreach (split) { #break $_ into words, assign each to $_ in turn $total++; next if /\W|^\d+/; #strange words skip the remainder of the loop $valid++; $count{$_}++; # count each separate word stored in a hash ## next comes here ## } } print "Total things = $total, valid words = $valid\n"; foreach my $word (sort keys %count) { print "$word \t was seen \t $count{$word} \t times.\n"; } ##---Data---- sample_file.txt ??????,???????,????.??????.????:"?????????????,??????,????????.????????,?????????, ???????????.????????,???????????,??????,??????.???:`??,???????????.'?????, ??????????."??????,??????.????.???, ????????????,????,??????,?????????,??????????????. ????????,??????,???????????,????????,????????.????,????,???????, ??????????,??????,????????.??????.

Read the article
Python unicode search not giving correct answer

- by user1318912

I am trying to search hindi words contained one line per file in file-1 and find them in lines in file-2. I have to print the line numbers with the number of words found. This is the code: import codecs hypernyms = codecs.open("hindi_hypernym.txt", "r", "utf-8").readlines() words = codecs.open("hypernyms_en2hi.txt", "r", "utf-8").readlines() count_arr = [] for counter, line in enumerate(hypernyms): count_arr.append(0) for word in words: if line.find(word) >=0: count_arr[counter] +=1 for iterator, count in enumerate(count_arr): if count>0: print iterator, ' ', count This is finding some words, but ignoring some others The input files are: File-1: ???? ??????? File-2: ???????, ????-???? ?????-???, ?????-???, ?????_???, ?????_??? ????_????, ????-????, ???????_???? ????-???? This gives output: 0 1 3 1 Clearly, it is ignoring ??????? and searching for ???? only. I have tried with other inputs as well. It only searches for one word. Any idea how to correct this?

Read the article
Algorithm to find the smallest snippet from searching a document?

- by deliciousirony

I've been going through Skiena's excellent "The Algorithm Design Manual" and got hung up on one of the exercises. The question is: "Given a search string of three words, find the smallest snippet of the document that contains all three of the search words—i.e. , the snippet with smallest number of words in it. You are given the index positions where these words in occur search strings, such as word1: (1, 4, 5), word2: (4, 9, 10), and word3: (5, 6, 15). Each of the lists are in sorted order, as above." Anything I come up with is O(n^2)... This question is in the "Sorting and Searching" chapter, so I assume there is a simple and clever way to do it. I'm trying something with graphs right now, but that seems like overkill. Ideas? Thanks

Read the article
Sum in shell script

- by Dinis Monteiro

Why can't I create a sum of total words in this script? I get the result something like: 120+130 but it isn't 250 (as I expected)! Is there any reason? #!/bin/bash while [ -z "$count" ] ; do echo -e "request :: please enter file name " echo -e "\n\tfile one : \c" read count itself=counter.sh countWords=`wc -w $count |cut -d ' ' -f 1` countLines=`wc -l $count |cut -d ' ' -f 1` countWords_=`wc -w $itself |cut -d ' ' -f 1` echo "Number of lines: " $countLines echo "Number of words: " $countWords echo "Number of words -script: " $countWords_ echo "Number of words -total " $countWords+$countWords_ done if [ ! -e $count ] ; then echo -e "error :: file one $count doesn't exist. can't proceed." read empty exit 1 fi

Read the article
Tag Cloud Data Backend

- by Waldron

I want to be able to generate tag clouds from free text that comes from any number of different sources. For clarity, I'm not talking about how to display a tag cloud once the critical tags/phrases are already discovered, I'm hoping to be able to discover the meaningful phrases themselves... preferable on a PHP/MySQL stack. If I had to do this myself, I'd start by establishing some kind of index for words/phrases that gives a "normal" frequency for any word/phrase. eg "Constantinople" occurs once in every 1,000,000 words on average (normal frequency "0.000001"). Then as I analyze a body of text, I'd find the individual words/phrases (another challenge!), find frequencies of each within the input, and measure against the expected freqeuncy. Words that have the highest ratio against expected frequency get boosted priority in the cloud. I'd like to believe someone else has already done this, WAY better than I could hope to, but I'll be damned if I can find it. Any recommendations??

Read the article
Whats the Best Practice for a Search SQL Query?

- by Marc V

I have a SQL 2008 Express database, which have following tables: CREATE TABLE Videos (VideoID bigint not null, Title varchar(100) NULL, Description varchar(MAX) NULL, isActive bit NULL ) CREATE TABLE Tags (TagID bigint not null, Tag varchar(100) NULL ) CREATE TABLE VideoTags (VideoID bigint not null, TagID bigint not null ) Now I need SQL query to search for word (i.e. Beyonce Halo Music Video) against these tables. Which videos have: For Title exact phrase will get 0.5 points For Description exact phrase will get 0.4 points For tags exact phrase will get 0.3 points For title all words will get 0.2 points For description all words will get 0.2 points For title one or more words will get 0.1 points For description one or more words will get 0.1 points And I will show these videos on basis of points. What will be the SQL Query for this? A LINQ query will be more better. If you know a better way to achieve this, please help.

Read the article
Php string handling tricks

- by Dam

Hi my question Need to get the 10 word before and 10 words after for the given text . i mean need to start the 10 words before the keyword and end with 10 word after the key word. Given text : "Twenty-three" The main trick : content having some html tags etc .. tags need to keep that tag with this content only . need to display the words from 10before - 10after content is bellow : removed Thank you

Read the article
Creating dynamic dictionary

- by Syom

i must create something like dictionary in my site, but there is one problem, i don't imagine ho to solve. the client wants the following: in the CMS he must be able to write some specification to some words or even sentences, and after it, in the site, onmouseover() of that words, i must show it's specification in popup window. for example, in the cms he writes "hello word" - "specification of hello world", and then, in the site, if i have the text many many words here hello world and another words... onmouseover of "hello world" i must show "specification of hello world". the problem, that i don't know how to solve, is how to write the functions on the text content? could you give me an idea... Thanks

Read the article
prints line number in both txtfile and list????

- by jad

i have this code which prints the line number in infile but also the linenumber in words what do i do to only print the line number of the txt file next to the words??? d = {} counter = 0 wrongwords = [] for line in infile: infile = line.split() wrongwords.extend(infile) counter += 1 for word in infile: if word not in d: d[word] = [counter] if word in d: d[word].append(counter) for stuff in wrongwords: print(stuff, d[stuff]) the output is : hello [1, 2, 7, 9] # this is printing the linenumber of the txt file hello [1] # this is printing the linenumber of the list words hello [1] what i want is: hello [1, 2, 7, 9]

Read the article
python and palindromes

- by tekknolagi

i recently wrote a method to cycle through /usr/share/dict/words and return a list of palindromes using my ispalindrome(x) method here's some of the code...what's wrong with it? it just stalls for 10 minutes and then returns a list of all the words in the file def reverse(a): return a[::-1] def ispalindrome(a): b = reverse(a) if b.lower() == a.lower(): return True else: return False wl = open('/usr/share/dict/words', 'r') wordlist = wl.readlines() wl.close() for x in wordlist: if not ispalindrome(x): wordlist.remove(x) print wordlist

Read the article
How to code an efficient blacklist filter function in php?

- by achairapart

So, I have three arrays like this: [items] => Array ( [0] => Array ( [id] => someid [title] => sometitle [author] => someauthor ... ) ... ) and also a string with comma separated words to blacklist: $blacklist = "some,words,to,blacklist"; Now I need to match these words with (as they can be one of) id, title, author and show results accordingly. I was thinking of a function like this: $pattern = '('.strtr($blacklist, ",", "|").')'; // should return (some|words|etc) foreach ($items as $item) { if ( !preg_match($pattern,$item['id']) || !preg_match($pattern,$item['title']) || !preg_match($pattern,$item['author']) ) { // show item } } and I wonder if this is the most efficient way to filter the arrays or I should use something with strpos() or filter_var with FILTER_VALIDATE_REGEXP ... Note that this function is repeated per 3 arrays. However, each array will not contain more than 50 items.

Read the article
Given a document select a relevant snippet.

- by BCS

When I ask a question here, the tool tips for the question returned by the auto search given the first little bit of the question, but a decent percentage of them don't give any text that is any more useful for understanding the question than the title. Does anyone have an idea about how to make a filter to trim out useless bits of a question? My first idea is to trim any leading sentences that contain only words in some list (for instance, stop words, plus words from the title, plus words from the SO corpus that have very weak correlation with tags, that is that are equally likely to occur in any question regardless of it's tags)

Read the article
How to check if a position inside a std string exists ?? (c++)

- by yox

Hello, i have a long string variable and i want to search in it for specific words and limit text according to thoses words. Say i have the following text : "This amazing new wearable audio solution features a working speaker embedded into the front of the shirt and can play music or sound effects appropriate for any situation. It's just like starring in your own movie" and the words : "solution" , "movie". I want to substract from the big string (like google in results page): "...new wearable audio solution features a working speaker embedded..." and "...just like starring in your own movie" for that i'm using the code : for (std::vector<string>::iterator it = words.begin(); it != words.end(); ++it) { int loc1 = (int)desc.find( *it, 0 ); if( loc1 != string::npos ) { while(desc.at(loc1-i) && i<=80){ i++; from=loc1-i; if(i==80) fromdots=true; } i=0; while(desc.at(loc1+(int)(*it).size()+i) && i<=80){ i++; to=loc1+(int)(*it).size()+i; if(i==80) todots=true; } for(int i=from;i<=to;i++){ if(fromdots) mini+="..."; mini+=desc.at(i); if(todots) mini+="..."; } } but desc.at(loc1-i) causes OutOfRange exception... I don't know how to check if that position exists without causing an exception ! Help please!

Read the article
Is there any way to configure what reCAPTCHA is actually displaying?

- by trejder

Is there any way to control, what kind of image is displayed to user in reCAPTCHA or what kind of puzzle he/she is required to solve? I noticed at least two significant changes to what reCAPTCHA is serving (and I must admit, that I don't much like these changes): For years reCAPTCHA was serving two words from scanned books and user was required to solve one of them. They were clearly readable (even those "second" ones, that could be ommitted) and with nearly no problem in solving them by a human. For past few month, I noticed a significant change at all of my sites, that are using reCAPTCHA. They started to show combination of computer-generated long numbers string and something, that looks for me as street/house number photographed in Google StreetView. They're even easier to solve, but what is most important -- it started to happen more and more often that user is obligated to solve both of them. Now, I have noticed another change/regression. Now some of my sites remain at so called "level 2" (like above) and some of them started to serve two words again ("level 1"?). And again, there are more and more situations, where solving both words is required. But, what is most important, on this "level" words are nearly impossible to solve (on my old mobile devices with 3.5'' display I need 5-6 attempts to pass on!). They're cluttered, written in some strange font, mostly in italics with a lot of black and white stains or drops on letters etc. Plus: reCAPTCHA stopped to be equal -- some of my pages are still serving "level2" while some of them are "killing" end users with a need to solve "level3". Is there anyway, I can control this -- force it to use only "level2" and on all my pages? (of course, I'm using exactly the same piece of code to serve reCAPTCHA on all my pages) Note, that I'm not asking for something like in this question. I don't want to change what reCAPTCHA shows (to disable words in favor of only numbers for example). I only want to control, which "version" of puzzles (among described above) reCAPTCHA shows and I want to make it equal on all my sites.

Read the article
Keyword sorting algorithm

- by Nai

I have over 1000 surveys, many of which contains open-ended replies. I would like to be able to 'parse' in all the words and get a ranking of the most used words (disregarding common words) to spot a trend. How can I do this? Is there a program I can use? EDIT If a 3rd party solution is not available, it would be great if we can keep the discussion to microsoft technologies only. Cheers.

Read the article
IEnumerable<string> => unique string[]

- by Maxim

Hello, i have collection of IEnumerable (sentence = string) i want to split all sentences to words (ex: .Select(t = t.Split(' ')); and after i need group this query by words to get list of unique words. Please, Help

Read the article
concatenate arbitrary long list of matches in SQL subquery

- by lordvlad

imagine 2 tables (rather stupid example, but for the sake of simplicity, here you go) words word_id letters letter word_id how can i select all words while selecting all letters that belong to a word and concatenating them to said word? it is important that the letters are returned in the order they appear in the table, as the letter may be mixed into other words, but the order is correct. |word_id| |word_id|letter| +-------+ +-------+------+ | 1| | 1| H| | 2| | 2| B| | 2| Y| | 1| I| | 2| E| should return |word_id|word| +-------+----+ | 1| HI| | 2| BYE| any way to accomplish this in pure SQL?

Read the article
Generating the permutations from a number of Characters

- by adam08

I'm working on a predictive text solution and have all the words being retrieved from a Trie based on input for a certain string of characters, i.e. "at" will give all words formed with "at" as a prefix. The problem that I have now, is that we are also supposed to return all other possibilities from pressing these 2 buttons, Button 2 and button 8 on the mobile phone, which would also give words formed with, "au, av, bt, bu, bv, ct, cu, cv" (most of which won't have any actual words. Can anyone suggest a solution and how I would go about doing this for calculating the different permutations? (at the moment, I'm prompting the user to enter the prefix (not using a GUI right now)

Read the article
My java program seems to be skipping over the try{}, executing the catch{} and then throwing a NullPointerException. What should I do?

- by Matt Bolopue

I am writing a program that calculates the number of words, syllables, and sentences in any given text file. I don't need help finding those numbers, however my program (which currently should only find the number of words in the text file) will not import the text file even when I type in the name of the file correctly. The text file is in the same folder as the source code. Instead it tells me every time that what I typed in has the wrong file extension (see my catch{}) and then proceeds to throw a null pointer. I am at a loss for what could be causing it. Any suggestions? import java.io.*; import java.util.*; public class Reading_Lvl_Calc { /** * @param args */ public static void main(String[] args) { // TODO Auto-generated method stub int words = 0; String fileName; Scanner scan; Scanner keyread = new Scanner(System.in); System.out.println("Please enter a file name (or QUIT to exit)"); fileName = keyread.nextLine(); File doc = new File(fileName); //while(scan.equals(null)){ try { scan = new Scanner(doc); } catch(Exception e) { if(fileName.substring(fileName.indexOf(".")) != ".txt") System.out.println("I'm sorry, the file \"" + fileName + "\" has an invalid file extension."); else System.out.println("I am sorry, the file \"" + fileName + " \" cannot be found.\n The file must be in the same directory as this program"); scan = null; } // } while(scan.hasNext()){ words++; scan.next(); } System.out.println(words); } }

Read the article
How to get all captures of subgroup matches with preg_match_all()?

- by hakre

Update/Note: I think what I'm probably looking for is to get the captures of a group in PHP. Referenced: PCRE regular expressions using named pattern subroutines. (Read carefully:) I have a string that contains a variable number of segments (simplified): $subject = 'AA BB DD '; // could be 'AA BB DD CC EE ' as well I would like now to match the segments and return them via the matches array: $pattern = '/^(([a-z]+) )+$/i'; $result = preg_match_all($pattern, $subject, $matches); This will only return the last match for the capture group 2: DD. Is there a way that I can retrieve all subpattern captures (AA, BB, DD) with one regex execution? Isn't preg_match_all suitable for this? This question is a generalization. Both the $subject and $pattern are simplified. Naturally with such the general list of AA, BB, .. is much more easy to extract with other functions (e.g. explode) or with a variation of the $pattern. But I'm specifically asking how to return all of the subgroup matches with the preg_...-family of functions. For a real life case imagine you have multiple (nested) level of a variant amount of subpattern matches. Example This is an example in pseudo code to describe a bit of the background. Imagine the following: Regular definitions of tokens: CHARS := [a-z]+ PUNCT := [.,!?] WS := [ ] $subject get's tokenized based on these. The tokenization is stored inside an array of tokens (type, offset, ...). That array is then transformed into a string, containing one character per token: CHARS -> "c" PUNCT -> "p" WS -> "s" So that it's now possible to run regular expressions based on tokens (and not character classes etc.) on the token stream string index. E.g. regex: (cs)?cp to express one or more group of chars followed by a punctuation. As I now can express self-defined tokens as regex, the next step was to build the grammar. This is only an example, this is sort of ABNF style: words = word | (word space)+ word word = CHARS+ space = WS punctuation = PUNCT If I now compile the grammar for words into a (token) regex I would like to have naturally all subgroup matches of each word. words = (CHARS+) | ( (CHARS+) WS )+ (CHARS+) # words resolved to tokens words = (c+)|((c+)s)+c+ # words resolved to regex I could code until this point. Then I ran into the problem that the sub-group matches did only contain their last match. So I have the option to either create an automata for the grammar on my own (which I would like to prevent to keep the grammar expressions generic) or to somewhat make preg_match working for me somehow so I can spare that. That's basically all. Probably now it's understandable why I simplified the question. Related: pcrepattern man page Get repeated matches with preg_match_all()

Read the article
please help me to solve problem

- by davit-datuashvili

first of all this is not homework and nobody tag it as homewrok i did not understand this porblem can anybody explain me?this is not english problem it is just misunderstanding what problem say Consider the problem of neatly printing a paragraph on a printer. The input text is a sequence of n words of lengths l1 , l2 , . . . , ln , measured in characters. We want to print this paragraph neatly on a number of lines that hold a maximum of M characters each. Our criterion of “neatness” is as follows. If a given line contains words i through j , where i = j , and we leave exactly one space between words, the number of extra space characters at the end of the line is M - j + i -(k=i,k< j,k++) lk , which must be nonnegative so that the words fit on the line. We wish to minimize the sum, over all lines except the last, of the cubes of the numbers of extra space characters at the ends of lines. Give a dynamic-programming algorithm to print a paragraph of n words neatly on a printer. Analyze the running time and space requirements of your algorithm.

Read the article
Help with algorithm to determine phrase occurrence in string in PHP

- by Will M.

I have an array of phrases (max 2 words) like $words = array('barack obama', 'chicago', 'united states'); and then I have a string like: $sentence = "Barack Obama is from Chicago. Barack Obama's favorite food it pizza."; I want to find/create an efficient algorithm that would return the number of occurrences of the words in the array $words in the string $sentence. In this case it would be: 'barack obama' => 2 'chicago' => 0 How can I built this?

Read the article
Word frequency tally script is too slow

- by Dave Jarvis

Background Created a script to count the frequency of words in a plain text file. The script performs the following steps: Count the frequency of words from a corpus. Retain each word in the corpus found in a dictionary. Create a comma-separated file of the frequencies. The script is at: http://pastebin.com/VAZdeKXs Problem The following lines continually cycle through the dictionary to match words: for i in $(awk '{if( $2 ) print $2}' frequency.txt); do grep -m 1 ^$i\$ dictionary.txt >> corpus-lexicon.txt; done It works, but it is slow because it is scanning the words it found to remove any that are not in the dictionary. The code performs this task by scanning the dictionary for every single word. (The -m 1 parameter stops the scan when the match is found.) Question How would you optimize the script so that the dictionary is not scanned from start to finish for every single word? The majority of the words will not be in the dictionary. Thank you!

Read the article
Memory not being returned after function python call

- by Dan

I've got a function which does a parse of a sentence by building up a big chart. For some reason, Python holds on to whatever memory was allocated during that function call. That is, I do best = translate(sentence, grammar) and somehow my memory goes up and stays up. Here is the function: from string import join from heapq import nsmallest, heappush def translate(f, g): words = f.split() chart = {} for col in range(len(words)): for row in reversed(range(0,col+1)): # get rules for this subspan rules = g[join(words[row:col+1], ' ')] # ensure there's at least one rule on the diagonal if not rules and row==col: rules=[(0.0, join(words[row:col+1]))] # pick up rules below & to the left for k in range(row,col): if (row,k) and (k+1,col) in chart: for (w1, e1) in chart[row, k]: for (w2, e2) in chart[k+1,col]: heappush(rules, (w1+w2, e1+' '+e2)) # add all rules to chart chart[row,col] = nsmallest(MAX_TRANSLATIONS, rules) (w, best) = chart[0, len(words)-1][0] return best EDIT: Using Python 2.7 on OS X. The grammar g is just a dictionary from strings to heaps, e.g.: g['et'] [(1.05, 'and'), (6.92, ', and'), (9.95, 'and ,'), (11.17, 'and to')] EDIT: If you want to run the code, try the sentence "Cela est difficile" with the following grammar: >>> g['cela'] [(8.28, 'this'), (11.21, 'it'), (11.57, 'that'), (15.26, 'this ,')] >>> g['est'] [(2.69, 'is'), (10.21, 'is ,'), (11.15, 'has'), (11.28, ', is')] >>> g['difficile'] [(2.01, 'difficult'), (10.08, 'hard'), (10.19, 'difficult ,'), (10.57, 'a difficult')]

Read the article
how to organize classes in ruby if they are literal subclasses

- by RetroNoodle

I know that title didn't make sense, Im sorry! Its hard to word what I am trying to ask. I had trouble googling it for the same reason. So this isn't even Ruby specific, but I am working in ruby and I am new to it, so bear with me. So you have a class that is a document. Inside each document, you have sentences, and each sentence has words. Words will have properties, like "noun" or a count of how many times they are used in the document, etc. I would like each of the elements, document, sentence, word be an object. Now, if you think literally - sentences are in documents, and words are in sentences. Should this be organized literally like this as well? Like inside the document class you will define and instantiate the sentence objects, and inside the sentence class you will define and instantiate the words? Or, should everything be separate and reference each other? Like the word class would sit outside the sentence class but the sentence class would be able to instantiate and work with words? This is a basic OOP question I guess, and I suppose you could argue to do it either way. What do you guys think? Each sentence in the document could be stored in a hash of sentence objects inside the document object, and each word in the sentence could be stored in a hash of word objects inside the sentence. I dont want to code myself into a corner here, thats why I am asking, plus I have wondered this before in other situations. Thank you!

Read the article

< Previous Page | 20 21 22 23 24 25 26 27 28 29 30 31 | Next Page >