Search Results

Search found 6107 results on 245 pages for 'reserved words'.

Page 43/245 | < Previous Page | 39 40 41 42 43 44 45 46 47 48 49 50 | Next Page >

Storing n-grams in database in < n number of tables.

- by kurige

If I was writing a piece of software that attempted to predict what word a user was going to type next using the two previous words the user had typed, I would create two tables. Like so: == 1-gram table == Token | NextWord | Frequency ------+----------+----------- "I" | "like" | 15 "I" | "hate" | 20 == 2-gram table == Token | NextWord | Frequency ---------+------------+----------- "I like" | "apples" | 8 "I like" | "tomatoes" | 12 "I hate" | "tomatoes" | 20 "I hate" | "apples" | 2 Following this example implimentation the user types "I" and the software, using the above database, predicts that the next word the user is going to type is "hate". If the user does type "hate" then the software will then predict that the next word the user is going to type is "tomatoes". However, this implimentation would require a table for each additional n-gram that I choose to take into account. If I decided that I wanted to take the 5 or 6 preceding words into account when predicting the next word, then I would need 5-6 tables, and an exponentially increase in space per n-gram. What would be the best way to represent this in only one or two tables, that has no upper-limit on the number of n-grams I can support?

Read the article
Can someone describe some DI terms to me?

- by SoBeNoFear

I'm in the process of writing a DI framework for PHP 5, and I've been trying to find the 'official' definitions of some words in relation to dependency injection. Some of these words are 'context' and 'lifecycle'. And also, what would I call the object that gets created/injected? Finally, what is the difference between components and services, and which term (if either) should I call the objects that can be injected? I've read Martin Fowler's article and looked through other DI frameworks (Phemto, Spring, Google Guice, Xyster, etc.), but I want to know what you think. Thanks!

Read the article
Strategy to structure a search index in a relational database

- by neilc

I am interested in suggestions for building an efficient and robust structure for indexing products in a new database I am building (i'm using MySql) When a product is entered through the form there are three parts I am interested in indexing for searching purposes. The product title The product description Tags The most important is title, followed by tags, followed by the description. I was thinking of using the following structure CREATE TABLE `searchindex` ( `id` INT NOT NULL , `word` VARCHAR( 255 ) NOT NULL , `weighting` INT NOT NULL , `product_id` INT NOT NULL , PRIMARY KEY ( `id` ) ) Then each time a product is created I would split apart the title, description and tags (removing common words) and award them a weighting. Then it is trivial to select out the words and corresponding products and order them by weighting. Is there a better way to do this? I would be worried that this strategy would slow down over time and as the database filled up.

Read the article
Which can handle a huge surge of queries: SQL Server 2008 Fulltext or Lucene

- by Luke101

I am creating a widget that will be installed on several websites and blogs. The widget will analyse the remote webpage title and content, then it will return relevent articles/links on my website. The amount of traffic we expect will be very huge roughly 500K queries a day and up from there. I need the queries to be returned very quickly, so I need the candidate to be high performance, similar to google adsense. The remote title can be from 5 to 50 words and the description we will use no more then 3000 words. Which of these two do you think can handle the load.

Read the article
"Anagram solver" based on statistics rather than a dictionary/table?

- by James M.

My problem is conceptually similar to solving anagrams, except I can't just use a dictionary lookup. I am trying to find plausible words rather than real words. I have created an N-gram model (for now, N=2) based on the letters in a bunch of text. Now, given a random sequence of letters, I would like to permute them into the most likely sequence according to the transition probabilities. I thought I would need the Viterbi algorithm when I started this, but as I look deeper, the Viterbi algorithm optimizes a sequence of hidden random variables based on the observed output. I am trying to optimize the output sequence. Is there a well-known algorithm for this that I can read about? Or am I on the right track with Viterbi and I'm just not seeing how to apply it?

Read the article
What is the best data structure and algorithm for comparing a list of strings?

- by Chiraag E Sehar

I want to find the longest possible sequence of words that match the following rules: Each word can be used at most once All words are Strings Two strings sa and sb can be concatenated if the LAST two characters of sa matches the first two characters of sb. In the case of concatenation, it is performed by overlapping those characters. For example: sa = "torino" sb = "novara" sa concat sb = "torinovara" For example, I have the following input file, "input.txt": novara torino vercelli ravenna napoli liverno messania noviligure roma And, the output of the above file according to the above rules should be: torino novara ravenna napoli livorno noviligure since the longest possible concatenation is: torinovaravennapolivornovilligure Can anyone please help me out with this? What would be the best data structure for this?

Read the article
Ideas for designing an automated content tagging system needed

- by Benjamin Smith

I am currently designing a website that amongst other is required to display and organise small amounts of text content (mainly quotes, article stubs, etc.). I currently have a database with 250,000+ items and need to come up with a method of tagging each item with relevant tags which will eventually allow for easy searching/browsing of the content for users. A very simplistic idea I have (and one that I believe is employed by some sites that I have been looking to for inspiration (http://www.brainyquote.com/quotes/topics.html)), is to simply search the database for certain words or phrases and use these words as tags for the content. This can easily be extended so that if for example a user wanted to show all items with a theme of love then I would just return a list of items with words and phrases relating to this theme. This would not be hard to implement but does not provide very good results. For example if I were to search for the month 'May' in the database with the aim of then classifying the items returned as realting to the topic of Spring then I would get back all occurrences of the word May, regardless of the semantic meaning. Another shortcoming of this method is that I believe it would be quite hard to automate the process to any large scale. What I really require is a library that can take an item, break it down and analyse the semantic meaning and also return a list of tags that would correctly classify the item. I know this is a lot to ask and I have a feeling I will end up reverting to the aforementioned method but I just thought I should ask if anyone knew of any pre-existing solution. I think that as the items in the database are short then it is probably quite a hard task to analyse any meaning from them however I may be mistaken. Another path to possibly go down would be to use something like amazon turk to outsource the task which may produce good results but would be expensive. Eventually I would like users to be able to (and want to!) tag content and to vote for the most relevant tags, possibly using a gameification mechanic as motivation however this is some way down the line. A temporary fix may be the best thing if this were the route I decided to go down as I could use the rough results I got as the starting point for a more in depth solution. If you've read this far, thanks for sticking with me, I know I'm spitballing but any input would be really helpful. Thanks.

Read the article
calling a function from another function in python

- by user1040503

I have written this function that takes to strings in order to see if they are anagrams: def anagram_check(str_x, str_y): x = string1.replace(" ","") y = string2.replace(" ","") lower1 = x.lower() lower2 = y.lower() sorted1 = sorted(lower1) sorted2 = sorted(lower2) if sorted1 == sorted2: return True else: return False this function works fine, the problem is that now I need to use this function in another function in order to find anagrams in a text file. I want to print a list of tuples with all the anagrams in it. this is what i have done so far def anagrams_finder(words_num): anagrams = [] f = open("words.txt") a = list(f) list1 = ([s.replace('\n', '') for s in a]) list2 = ([i.lower() for i in list1]) list3 = list2[0:words_num] #number of words from text that need to be checked. for i in list3: .... I tried using for loops, while loops, appand.... but nothing seems to work. how can I use the first function in order to help me with the second? Please help...

Read the article
word ladder in python

- by user365523

I'm trying to create a word ladder program in python. I'd like to generate words that are similar to a given word. In c++ or java, I would go through each valid index in the original string, and replace it with each letter in the english alphabet, and see if the result is a valid word. for example (pseudocode) for (int i = 0; i < word.length(); i++) { for (every character c in the alphabet) { change the letter of word at index i to be c. if the result is a valid word, store it in a list of similar words } } . However, this doesn't seem like a very "python" way of doing things. How would I approach this problem in python?

Read the article
How can I convert German characters during XML read and PHP write into mysql?

- by kitenski

Morning, I am inputting data from an XML file into my database, but have any isse with German words (that are in the XML by mistake) For example the word für appears in my XML as fÃ¼r and thus appears the same in my database. I know I could do a simple search/replace for that exact phrase, but I was wondering if there was a smarter way to do it as I can't predict if any other German words may one day appear in the XML? ADDING SOME MORE DETAIL The XML source says: and in my PHP I have $domString = utf8_encode($dom-saveXML($element)); If I look into the XML file before I start reading it, it has - <title> - <![CDATA[ CoPilot Live v8 Europa für Android 8.0.0.644 ]]> </title> Thanks. Greg

Read the article
Javascript / jQuery Exec turns up Null

- by Matrym

How do I skip over this next line if it turns out to be null? Currently, it (sometimes) "breaks" and prevents the script from continuing. var title = (/(.*?)<\/title/m).exec(response)[1]; $.get(url, function(response){ var title = (/<title>(.*?)<\/title>/m).exec(response)[1]; if (title == null || title == undefined){ return false; } var words = title.split(' '); $.each(words, function(index, value){ $link.highlight(value + " "); $link.highlight(" " + value); }); });

Read the article
How to handle right to left languages in Flash (pre version 10)?

- by Maan Ashgar

Hello, We are currently working with Flex creating a web application. We are having trouble taking Arabic text from the user and displaying correctly (like in a chat feature). While presumably Flash 10 will solve this problem, we don't want to force our users to upgrade. Flash flips the order of the sentence's words. so if I wrote something like "Hello World" in the text field, it will appear as "World Hello" in the chat area. Is there a standard way to work with Right to Left languages in Flash? *We currently flip the order of the words with a function, but it things get messed up when using English or special characters in the chat like :) or :D *

Read the article
What's the best way to match a query to a set of keywords?

- by Ryan Detzel

Pretty much what you would assume Google does. Advertisers come in and big on keywords, lets say "ipod", "ipod nano", "ipod 60GB", "used ipod", etc. Then we have a query, "I want to buy an ipod nano" or "best place to buy used ipods" what kind of algorithms and systems are used to match those queries to the keyword set. I would imagine that some of those keyword sets are huge, 100k keywords made up of one or more actual words. on top of that queries can be 1-n words as well. Any thoughts, links to wikipedia I can start reading? From what I know already I would use some stemmed hash in disk(CDB?) and a bloom filter to check to see if I should even go to disk.

Read the article
Split string on non-alphanumerics in PHP? Is it possible with php's native function?

- by Jehanzeb.Malik

I was trying to split a string on non-alphanumeric characters or simple put I want to split words. The approach that immediately came to my mind is to use regular expressions. Example: $string = 'php_php-php php'; $splitArr = preg_split('/[^a-z0-9]/i', $string); But there are two problems that I see with this approach. It is not a native php function, and is totally dependent on the PCRE Library running on server. An equally important problem is that what if I have punctuation in a word Example: $string = 'U.S.A-men's-vote'; $splitArr = preg_split('/[^a-z0-9]/i', $string); Now this will spilt the string as [{U}{S}{A}{men}{s}{vote}] But I want it as [{U.S.A}{men's}{vote}] So my question is that: How can we split them according to words? Is there a possibility to do it with php native function or in some other way where we are not dependent? Regards

Read the article
Tag Suggestion system, approaches and ideas

- by Galois

Hi guys! -- I am working on a (auto) tag suggestion system (NOT tag autocomplete). Lets say I want to suggest tags for a given question like here on SO (although SO's tagging system is auto-complete). My main idea is to get the intersection between the tags_set and the given question.split()_set. (In python the set_intersection is efficient enough). Also, in order to make it a little bit more accurate I might use words-distance to count as 'the same' very close words i.e movie == movies. For now I am not thinking about using any Collaborative Filtering technique looking for the tags to similar questions and so on, because I believe since the question text is pretty short (comparing with a blog article or a paper) it is not worth the effort. So I was wondering if you have any other (more) efficient approaches to suggest. Any ideas, specially from people who they have done something like that before, are more than welcome.

Read the article
Recursion Interview Questions [closed]

- by halivingston

Given a string, "ABC", print all permutations Given a dollar bill, fill out possible ways it can summed up using .25, .10, .5, etc. Given a phone number (123-456), print out all it's word counter parters like (ADG-XYZ) A B C D E F G H I J K L M N O P In the above 2D matrix, print all possible words (just literally all words, and sure we could check if it's exists in a dictionary). The base case is I think here is that reaching the same i, j positions. Any others you can think of?

Read the article
bitshift large strings for encoding QR Codes

- by icekreaman

As an example, suppose a QR Code data stream contains 55 data words (each one byte in length) and 15 error correction words (again one byte). The data stream begins with a 12 bit header and ends with four 0 bits. So, 12 + 4 bits of header/footer and 15 bytes of error correction, leaves me 53 bytes to hold 53 alphanumeric characters. The 53 bytes of data and 15 bytes of ec are supplied in a string of length 68 (str68). The problem seems simple enough - concatenate 2 bytes of (right-shifted) header data with str68 and then left shift the entire 70 bytes by 4 bits. This is the first time in many years of programming that I have ever needed to do something like this, I am a c and bit shifting noob, so please be gentle... I have done a little investigation and so far have not been able to figure out how to bitshift 70 bytes of data; any help would be greatly appreciated. Larger QR codes can hold 2000 bytes of data...

Read the article
working validation hint, working word counter but not working together

- by Sriyani Rathnayaka

I added a word counter to a my form's textarea... it is something like this... <div> <label>About you:</label> <textarea id="qualification" class="textarea hint_needed" rows="4" cols="30" ></textarea> <span class="hint">explain about you</span> <script type="text/javascript"> $("textarea").textareaCounter(); </script> </div> My problem is when I add textaracounter() like this my validation hint is not working.. when I remover the counter function validation hint is working... this is the jquery for hint message.. $(".hint").css({ "display":"none" }); $("input.hint_needed, select.hint_needed, textarea.hint_needed, radio.hint_needed").on("mouseenter", function() { $(this).next(".hint").css({ "display":"inline" }); }).on("mouseleave", function() { $(this).next(".hint").css({ "display":"none" }); }); this is for the word counter.. (function($){ $.fn.textareaCounter = function(options) { // setting the defaults // $("textarea").textareaCounter({ limit: 100 }); var defaults = { limit: 150 }; var options = $.extend(defaults, options); // and the plugin begins return this.each(function() { var obj, text, wordcount, limited; obj = $("#experience"); obj.after('<span style="font-weight: bold; color:#6a6a6a; clear: both; margin: 3px 0 0 150px; float: left; overflow: hidden;" id="counter-text">Max. '+options.limit+' words</span>'); obj.keyup(function() { text = obj.val(); if(text === "") { wordcount = 0; } else { wordcount = $.trim(text).split(" ").length; } if(wordcount > options.limit) { $("#counter-text").html('<span style="color: #DD0000;">0 words left</span>'); limited = $.trim(text).split(" ", options.limit); limited = limited.join(" "); $(this).val(limited); } else { $("#counter-text").html((options.limit - wordcount)+' words left'); } }); }); }; })(jQuery); can anybody tell me what is the problem there? Thank you..

Read the article
Data on the Frequency of Edit Operations Required to Correct a Misspelt Word

- by gvkv

Does anybody know of any data that relates to the frequency of the types of mistakes the people make when they misspell a word? I'm not referring to words themselves, but tje errors that are made by the typist. For example, I personally make transposition errors the most followed by deletion errors (that is, not including a letter I should), substitution errors and lastly, insertion errors. However, it would not surprise me to find out that typing a wrong letter (a substitution error, e.g., xat instead of cat) is more frequent than not including a letter. My purpose is to be able to make best guesses at correcting a word when I only have the original user's input. The idea being that if one type of error is more frequent than others, then it's more likely that correcting a word via that type of operation is correct. I don't object to using a database of commonly misspelt words but I prefer an algorithmic solution to depending on a corpus--especially if it might be faster.

Read the article
Problem With Inserts of multibyte (converted to utf-8) strings in the mysql tables of utf_unicode_ci encoding

- by user381595

http://domainsoutlook.com/sandbox/keyword/?s=http://bhaskar.com raw example of my keyword density analyser. Every keyword shows up properly with no problems in unicode conversions etc. Now, When I am adding these words to the database column of a table, the words show up as messed up. http domainsoutlook.com/b/site/bhaskar.com.html For example on this front end page if you see there is a keyword that is shown as a blank but still occurs on the website 8 times. (It isnt empty in the database though). I have checked and there is no problem with mysql_real_escape_String...because the output stays the same before and after the word is gone through mysql_real_escape_String. Another problem was that I wanted to fix my urls for arabic language. They should be showing up as /word-{1st letter of the word}/{whole word}.html but its showing as /word-{whole word}/{1st letter of the word}.html I really need answers for these two questions.

Read the article
Strange sql query result from Mysql and from PHP mysqli_query!

- by qinHaiXiang

this is the query command echo from php web page: SELECT DISTINCT FT.file_type_name AS type,FT.file_type_en AS tp,FT.file_type_id AS fti, MATCH(keywords) AGAINST ('words <2' IN BOOLEAN MODE ) AS score FROM movie AS M,file_type AS FT WHERE MATCH (keywords) AGAINST ('words <2' IN BOOLEAN MODE ) AND M.type_cn = FT.file_type_id HAVING score >=1 ORDER BY FT.file_type_order; I am running above query in MySQL tools HeidiSQL and got only tow row records which score are 1.66666 and 2. If I remove the HAVING clause I would got three row records with one's score less than 1. But the same query I get from PHP mysqli_query() were all the three records and the one which score less than 1 became 1. What is the problem. Any tips will be pleasure. Thank you very much!!

Read the article
How to split up a long list using \n

- by pypy

Here is a long string that I convert to a list so I can manipulate it, and then join it back together. I am having some trouble being able to have an iterator go through the list and when the iterator reach, let us say every 5th object, it should insert a '\n' right there. Here is an example: string = "Hello my name is Josh I like pizza and python I need this string to be really really long" string = string.split() # do the magic here string = ' '.join(string) print(string) Output: Hello my name is Josh I like pizza and python I need this string to be really really long Any idea how i can achieve this? I tried using: for words in string: if words % 5 == 0: string.append('\n') but it doesn't work. What am I missing?

Read the article
converting ID to column name and also replacing NULL with last known value.

- by stackoverflowuser

TABLE_A Rev ChangedBy ----------------------------- 1 A 2 B 3 C TABLE_B Rev Words ID ---------------------------- 1 description_1 52 1 history_1 54 2 description_2 52 3 history_2 54 Words column datatype is ntext. TABLE_C ID Name ----------------------------- 52 Description 54 History OUTPUT Rev ChangedBy Description History ------------------------------------------------ 1 A description_1 history_1 2 B description_2 history_1 3 C description_2 history_2 Description and History column will have the previous known values if they dont have value for that Rev no. i.e. Since for Rev no. 3 Description does not have an entry in TABLE_B hence the last known value description_2 appears in that column for Rev no. 3 in the output.

Read the article
Can not search my company howto blog site anylonger... i can only search my mysites and users...

- by Worldunix

I have a Howto company Blog site that i post to for my clients to access for help. For some reason it has stopped letting anyone search on it. I can search for Mysites or users. But when you drop down the tab to search: This Site: "blog site name" you get the following reply: No results matching your search were found. Check your spelling. Are the words in your query spelled correctly? Try using synonyms. Maybe what you're looking for uses slightly different words. Make your search more general. Try more general terms in place of specific ones. Try your search in a different scope. Different scopes can have different results. I have tried the following command: from the Index server net stop osearch net start osearch iisreset /noforce But still not able to search a local blog site I can only search for users and Sites. please help Don

Read the article
How do I do proximity search in Oracle right?

- by hko19

Oracle's NEAR operator for full text search returns a score based on the proximity of two or more query terms. For example: near((dog, bite), 6) matches if 'dog' and 'bite' occurs within 6 words. What if I'd like it to match if either 'dog' or 'cat' or any other type of animal occurs within 6 words of the word 'bite'? I tried: near(((dog OR cat OR animal), bite), 6) but I got: NEAR operand not a phrase, equivalence or another NEAR expression Rather than expanding all possible combination into multiple NEAR and 'or' them together, what is the proper way to write such query?

Read the article

< Previous Page | 39 40 41 42 43 44 45 46 47 48 49 50 | Next Page >