Search Results

Search found 5105 results on 205 pages for 'words'.

Page 20/205 | < Previous Page | 16 17 18 19 20 21 22 23 24 25 26 27  | Next Page >

  • Using a Custom Dictionary with Microsoft's MODI

    - by ejrichards
    I am currently using Microsoft's MODI (Microsoft Office Document Imaging) to read text in an image in C#. Everything is working fine, except some of the words I want to read are not real English words. Is there any way to use a custom dictionary when using MODI or add words to the regular English dictionary that it uses?

    Read the article

  • Python slicing a string using space characters and a maximum length

    - by chrism
    I'd like to slice a string up in a similar way to .split() (so resulting in a list) but in a more intelligent way: I'd like it to split it into chunks that are up to 15 characters, but are not split mid word so: string = 'A string with words' [splitting process takes place] list = ('A string with','words') The string in this example is split between 'with' and 'words' because that's the last place you can split it and the first bit be 15 characters or less.

    Read the article

  • Ignore duplicates in regex pattern

    - by gAMBOOKa
    I have a regex pattern that searches for words in a text file. How do I ignore duplicates? For instance, take a look at this code $pattern = '/(lorem|ipsum|daboom|pahwal|ababaga)/i'; $num_found = preg_match_all( $pattern, $string, $matches ); echo "$num_found match(es) found!"; echo "Matched words: " . implode( ',', $matches[0] ); If I have more than one say lorem in the article, the output will be something like this 5 matches found! Matched words: daboom,lorem,lorem,lorem,lorem I want the pattern to only find the first occurrence, and ignore the rest, so the output should be: 2 matches found! Matched words: daboom,lorem

    Read the article

  • Flex wordwrap issue with multiple text instances

    - by Craig Myles
    Hi, I have a scenario where I want to dynamically add words of text to a container so that it forms a paragraph of text which is wrapped neatly according to the size of the parent container. Each text element will have differing formatting, and will have differing user interaction options. For example, imagine the text " has just spoken out about ". Each word will be added to the container one at a time, at run time. The username in this case would be bold, and if clicked on will trigger an event. Same with the news article. The rest of the text is just plain text which, when clicked on, would do nothing. Now, I'm using Flex 3 so I don't have access to the fancy new text formatting tools. I've implemented a solution where the words are plotted onto a canvas, but this means that the words are wrapped at a particular y position (an arbitrary value I've chosen). When the container is resized, the words still wrap at that position which leaves lots of space. I thought about adding each text element to an Array Collection and using this as a datasource for a Tile List, but Tile Lists don't support variable column widths (in my limited knowledge) so each word would use the same amount of space which isn't ideal. Does anyone know how I can plot words onto a container so that I can retain formatting, events and word wrapping at paragraph level, even if the container is resized?

    Read the article

  • Explain JAVA code

    - by MIW
    I need some help to explain the meaning from line 5 to line 9. Thanks String words = "Rain Rain go away"; String mutation1, mutation2, mutation3, mutation4; mutation1 = words.toUpperCase(); System.out.println ("** " + mutation1 + " Nursery Rhyme **"); mutation1 = words.concat ("\nCome again another day"); mutation2 = "Johnny Johnny wants to play"; mutation3 = mutation2.replace (mutation2.charAt(5), 'i'); mutation4 = mutation3.substring (7, 27); System.out.print ("\'" + mutation1 + "\n" + mutation4 + "\'\n"); 10.System.out.println ("Title length: " + words.length());

    Read the article

  • Is there stl and utf8 friendly C++ Wrapper for ICU, or other powerful unicode library

    - by artyom
    Hello, I need a good Unicode library for C++. I need Transformations in Unicode sensitive way. For example sort all strings in case insensitive way and get their first characters for index. Convert to upper and to lower various Unicode strings. Split text in reasonable position -- words that would work for Chinese and Japanese as well. Formatting numbers, dates in locale sensitive way (should be thread safe). Transparent support of utf8 (primary internal representation). As far as I know the best library is ICU. However, I can't find normal developer friendly API documentation with examples. Also as far as I see, it is not too friendly with modern C++ design, work with STL and so on. Like this std::string msg; unistring umsg.from_utf8(msg); unistring::word_iterator wi; for(wi=umsg.words().begin(),n=0;wi!=usmg.words().wi_end(),n<10;++wi,++n) ; msg=umsg.substr(umsg.words().begin(),wi).to_utf8(); cout<<_("Five 10 words are ")<<msg; Does anybody know good STL friendly ICU wrapper released under Open Source license preferred permissive like MIT or Boost, but others LGPLv2 compatible are ok as well. Is there another high quality library similar to ICU? Platform: UNIX/POSIX, Windows support is not required. Thanks, Artyom Edit: Unfortunatly I wasn't logged in so I can't make asnver accepted... I had attached the ansver by myself.

    Read the article

  • find word and score based on positions

    - by ryder1211212
    hey guys i have a textfile i have divided it into 4 parts. i want to search each part for the words that appear in each part and score that word exmaple welcome to the national basketball finals,the basketball teams here today have come a long way. without much delay lets play basketball. i will want to return national = 1 as it appears only in one part etc am working on determining text context using word position. am working with c# and not very good in text processing basically if a word appears in the 4 sections it scores 4 if a word appears in the 3 sections it scores 3 if a word appears in the 2 sections it scores 2 if a word appears in the 1 section it scores 1 thanks in advance so far i have this var s = "welcome to the national basketball finals,the basketball teams here today have come a long way. without much delay lets play basketball. "; var numberOfParts = 4; var eachPartLength = s.Length / numberOfParts; var parts = new List<string>(); var words = Regex.Split(s, @"\W").Where(w => w.Length > 0); // this splits all words, removes empty strings var wordsIndex = 0; for (int i = 0; i < numberOfParts; i++) { var sb = new StringBuilder(); while (sb.Length < eachPartLength && wordsIndex < words.Count()) { sb.AppendFormat("{0} ", words.ElementAt(wordsIndex)); wordsIndex++; } // here you have the part Response.Write("[{0}]"+ sb); parts.Add(sb.ToString()); var allwords = parts.SelectMany(p => p.Split(' ').Distinct()); var wordsInAllParts = allwords.Where(w => parts.All(p => p.Contains(w))).Distinct();

    Read the article

  • L10N: Trusted test data for Locale Specific Sorting

    - by Chris Betti
    I'm working on an internationalized database application that supports multiple locales in a single instance. When international users sort data in the applications built on top of the database, the database theoretically sorts the data using a collation appropriate to the locale associated with the data the user is viewing. I'm trying to find sorted lists of words that meet two criteria: the sorted order follows the collation rules for the locale the words listed will allow me to exercise most / all of the specific collation rules for the locale I'm having trouble finding such trusted test data. Are such sort-testing datasets currently available, and if so, what / where are they? "words.en.txt" is an example text file containing American English text: Andrew Brian Chris Zachary I am planning on loading the list of words into my database in randomized order, and checking to see if sorting the list conforms to the original input. Because I am not fluent in any language other than English, I do not know how to create sample datasets like the following sample one in French (call it "words.fr.txt"): cote côte coté côté The French prefer diacritical marks to be ordered right to left. If you sorted that using code-point order, it likely comes out like this (which is an incorrect collation): cote coté côte côté Thank you for the help, Chris

    Read the article

  • Hierarchy of meaning

    - by asldkncvas
    I am looking for a method to build a hierarchy of words. Background: I am a "amateur" natural language processing enthusiast and right now one of the problems that I am interested in is determining the hierarchy of word semantics from a group of words. For example, if I have the set which contains a "super" representation of others, i.e. [cat, dog, monkey, animal, bird, ... ] I am interested to use any technique which would allow me to extract the word 'animal' which has the most meaningful and accurate representation of the other words inside this set. Note: they are NOT the same in meaning. cat != dog != monkey != animal BUT cat is a subset of animal and dog is a subset of animal. I know by now a lot of you will be telling me to use wordnet. Well, I will try to but I am actually interested in doing a very domain specific area which WordNet doesn't apply because: 1) Most words are not found in Wordnet 2) All the words are in another language; translation is possible but is to limited effect. another example would be: [ noise reduction, focal length, flash, functionality, .. ] so functionality includes everything in this set. I have also tried crawling wikipedia pages and applying some techniques on td-idf etc but wikipedia pages doesn't really do much either. Can someone possibly enlighten me as to what direction my research should go towards? (I could use anything)

    Read the article

  • Hbase schema design -- to make sorting easy?

    - by chen
    I have 1M words in my dictionary. Whenever a user issue a query on my website, I will see if the query contains the words in my dictionary and increment the counter corresponding to them individually. Here is the example, say if a user type in "Obama is a president" and "Obama" and "president" are in my dictionary, then I should increment the counter by 1 for "Obama" and "president". And from time to time, I want to see the top 100 words (most queried words). If I use Hbase to store the counter, what schema should I use? -- I have not come up an efficient one yet. If I use word in my dictionary as row key, and "counter" as column key, then updating counter(increment) is very efficient. But it's very hard to sort and return the top 100. Anyone can give a good advice? Thanks.

    Read the article

  • Mysql Performance Question - Essentially about normalizing efficiency

    - by freqmode
    Hi there. Just a quick question about database performance. I'll outline my site purpose below as background. I'm creating a dictionary site that saves the words users define to a database. What I'm wondering is whether or not to create a words table for each user or to keep one massive words table. This site will be used for entire schools so the single words table would be massive! The database structure is as follows: A user table with: User_ID PRIMARY KEY Username First Last Password Email Country Research Standings SendInfo Donated JoinedOn LastLogin Logins Correct Attempts Admin Active And one word table with: User_ID PRIMARY KEY Word Vocab Spell Defined DefinedAttempted Spelled SpelledAttempted Sentenced SentencedAttempted So what I'm asking is , performance-wise, should I create a new table for each user when they join the site - each user could have hundreds or thousands of words over time? Or is it better to have one massive table with thousands and thousands of records and filter by User_ID. I don't think I'll perform many table joins. My gut feeling is to create a new table for each user, but I thought I'd ask for expert advice! Thanks in advance.

    Read the article

  • How to link a table to a field a in MySQL server

    - by Nek
    I have this data from a xml file: <?xml version="1.0" encoding="utf-8" ?> <words> <id>...</id> <word>...</word> <meaning>...</meaning> <translation> <ES>...</ES> <PT>...</PT> </translation> </words> This forms the table named "words", which has four fields ("id","word","meaning" and "translation"). On the other hand, the "translation" field can hold several languages like ES,PT,EN,JA,KO,etc... So I create a table ("words.translation", one field is "id" and the others ones are languages ids like "ES","PT",...). I'm sorry for this newby question, but I'd like to know a couple of things about this one-to-many relationship. How to join (or link?) this two tables in MySQL? What information does the "translation" field in the "words" table has to store? How is the sql query to get all the word information (JOIN syntax used?) Thanks for your patience.

    Read the article

  • Algorithm for sentence analysis and tokenization

    - by Andrea Nagar
    I need to analyze a document and compile statistics as to how many times each a sequence of words is used (so the analysis is not on single words but of batch of recurring words). I read that compression algorithms do something similar to what I want - creating dictionaries of blocks of text with a piece of information reporting its frequency. It should be something similar to http://www.codeproject.com/KB/recipes/Patterns.aspx Do you have anything written in C#?

    Read the article

  • How to count the Chinese word in a file using regex in perl?

    - by Ivan
    I tried following perl code to count the Chinese word of a file, it seems working but not get the right thing. Any help is greatly appreciated. The Error message is Use of uninitialized value $valid in concatenation (.) or string at word_counting.pl line 21, <FILE> line 21. Total things = 125, valid words = which seems to me the problem is the file format. The "total thing" is 125 that is the string number (125 lines). The strangest part is my console displayed all the individual Chinese words correctly without any problem. The utf-8 pragma is installed. #!/usr/bin/perl -w use strict; use utf8; use Encode qw(encode); use Encode::HanExtra; my $input_file = "sample_file.txt"; my ($total, $valid); my %count; open (FILE, "< $input_file") or die "Can't open $input_file: $!"; while (<FILE>) { foreach (split) { #break $_ into words, assign each to $_ in turn $total++; next if /\W|^\d+/; #strange words skip the remainder of the loop $valid++; $count{$_}++; # count each separate word stored in a hash ## next comes here ## } } print "Total things = $total, valid words = $valid\n"; foreach my $word (sort keys %count) { print "$word \t was seen \t $count{$word} \t times.\n"; } ##---Data---- sample_file.txt ??????,???????,????.??????.????:"?????????????,??????,????????.????????,?????????, ???????????.????????,???????????,??????,??????.???:`??,???????????.'?????, ??????????."??????,??????.????.???, ????????????,????,??????,?????????,??????????????. ????????,??????,???????????,????????,????????.????,????,???????, ??????????,??????,????????.??????.

    Read the article

  • Python unicode search not giving correct answer

    - by user1318912
    I am trying to search hindi words contained one line per file in file-1 and find them in lines in file-2. I have to print the line numbers with the number of words found. This is the code: import codecs hypernyms = codecs.open("hindi_hypernym.txt", "r", "utf-8").readlines() words = codecs.open("hypernyms_en2hi.txt", "r", "utf-8").readlines() count_arr = [] for counter, line in enumerate(hypernyms): count_arr.append(0) for word in words: if line.find(word) >=0: count_arr[counter] +=1 for iterator, count in enumerate(count_arr): if count>0: print iterator, ' ', count This is finding some words, but ignoring some others The input files are: File-1: ???? ??????? File-2: ???????, ????-???? ?????-???, ?????-???, ?????_???, ?????_??? ????_????, ????-????, ???????_???? ????-???? This gives output: 0 1 3 1 Clearly, it is ignoring ??????? and searching for ???? only. I have tried with other inputs as well. It only searches for one word. Any idea how to correct this?

    Read the article

  • Algorithm to find the smallest snippet from searching a document?

    - by deliciousirony
    I've been going through Skiena's excellent "The Algorithm Design Manual" and got hung up on one of the exercises. The question is: "Given a search string of three words, find the smallest snippet of the document that contains all three of the search words—i.e. , the snippet with smallest number of words in it. You are given the index positions where these words in occur search strings, such as word1: (1, 4, 5), word2: (4, 9, 10), and word3: (5, 6, 15). Each of the lists are in sorted order, as above." Anything I come up with is O(n^2)... This question is in the "Sorting and Searching" chapter, so I assume there is a simple and clever way to do it. I'm trying something with graphs right now, but that seems like overkill. Ideas? Thanks

    Read the article

  • Sum in shell script

    - by Dinis Monteiro
    Why can't I create a sum of total words in this script? I get the result something like: 120+130 but it isn't 250 (as I expected)! Is there any reason? #!/bin/bash while [ -z "$count" ] ; do echo -e "request :: please enter file name " echo -e "\n\tfile one : \c" read count itself=counter.sh countWords=`wc -w $count |cut -d ' ' -f 1` countLines=`wc -l $count |cut -d ' ' -f 1` countWords_=`wc -w $itself |cut -d ' ' -f 1` echo "Number of lines: " $countLines echo "Number of words: " $countWords echo "Number of words -script: " $countWords_ echo "Number of words -total " $countWords+$countWords_ done if [ ! -e $count ] ; then echo -e "error :: file one $count doesn't exist. can't proceed." read empty exit 1 fi

    Read the article

  • Tag Cloud Data Backend

    - by Waldron
    I want to be able to generate tag clouds from free text that comes from any number of different sources. For clarity, I'm not talking about how to display a tag cloud once the critical tags/phrases are already discovered, I'm hoping to be able to discover the meaningful phrases themselves... preferable on a PHP/MySQL stack. If I had to do this myself, I'd start by establishing some kind of index for words/phrases that gives a "normal" frequency for any word/phrase. eg "Constantinople" occurs once in every 1,000,000 words on average (normal frequency "0.000001"). Then as I analyze a body of text, I'd find the individual words/phrases (another challenge!), find frequencies of each within the input, and measure against the expected freqeuncy. Words that have the highest ratio against expected frequency get boosted priority in the cloud. I'd like to believe someone else has already done this, WAY better than I could hope to, but I'll be damned if I can find it. Any recommendations??

    Read the article

  • Whats the Best Practice for a Search SQL Query?

    - by Marc V
    I have a SQL 2008 Express database, which have following tables: CREATE TABLE Videos (VideoID bigint not null, Title varchar(100) NULL, Description varchar(MAX) NULL, isActive bit NULL ) CREATE TABLE Tags (TagID bigint not null, Tag varchar(100) NULL ) CREATE TABLE VideoTags (VideoID bigint not null, TagID bigint not null ) Now I need SQL query to search for word (i.e. Beyonce Halo Music Video) against these tables. Which videos have: For Title exact phrase will get 0.5 points For Description exact phrase will get 0.4 points For tags exact phrase will get 0.3 points For title all words will get 0.2 points For description all words will get 0.2 points For title one or more words will get 0.1 points For description one or more words will get 0.1 points And I will show these videos on basis of points. What will be the SQL Query for this? A LINQ query will be more better. If you know a better way to achieve this, please help.

    Read the article

  • Php string handling tricks

    - by Dam
    Hi my question Need to get the 10 word before and 10 words after for the given text . i mean need to start the 10 words before the keyword and end with 10 word after the key word. Given text : "Twenty-three" The main trick : content having some html tags etc .. tags need to keep that tag with this content only . need to display the words from 10before - 10after content is bellow : removed Thank you

    Read the article

  • Creating dynamic dictionary

    - by Syom
    i must create something like dictionary in my site, but there is one problem, i don't imagine ho to solve. the client wants the following: in the CMS he must be able to write some specification to some words or even sentences, and after it, in the site, onmouseover() of that words, i must show it's specification in popup window. for example, in the cms he writes "hello word" - "specification of hello world", and then, in the site, if i have the text many many words here hello world and another words... onmouseover of "hello world" i must show "specification of hello world". the problem, that i don't know how to solve, is how to write the functions on the text content? could you give me an idea... Thanks

    Read the article

  • prints line number in both txtfile and list????

    - by jad
    i have this code which prints the line number in infile but also the linenumber in words what do i do to only print the line number of the txt file next to the words??? d = {} counter = 0 wrongwords = [] for line in infile: infile = line.split() wrongwords.extend(infile) counter += 1 for word in infile: if word not in d: d[word] = [counter] if word in d: d[word].append(counter) for stuff in wrongwords: print(stuff, d[stuff]) the output is : hello [1, 2, 7, 9] # this is printing the linenumber of the txt file hello [1] # this is printing the linenumber of the list words hello [1] what i want is: hello [1, 2, 7, 9]

    Read the article

< Previous Page | 16 17 18 19 20 21 22 23 24 25 26 27  | Next Page >