Search Results

Search found 13324 results on 533 pages for 'stop words'.

Page 85/533 | < Previous Page | 81 82 83 84 85 86 87 88 89 90 91 92 | Next Page >

How to sift idioms and set phrases apart from other common phrases using NLP techniques?

- by hippietrail

What techniques exist that can tell the difference betwen plain common phrases such as "to the", "and the" and set phrases and idioms which have their own lexical meanings such as "pick up", "fall in love", "red herring", "dead end"? Are there techniques which are successful even without a dictionary, statistical methods HMMs train on large corpora for instance? Or are there heuristics such as ignoring or weighting down "promiscuous" words which can co-occur with just about any word versus words which occur either alone or in a specific limited set of idiomatic phrases? If there are such heuristics, how do we take into account set phrases and verbal phrases which do incorporate promiscuous words such as "up" in "beat up", "eat up", "sit up", "think up"? UPDATE I've found an interesting paper online: Unsupervised Type and Token Identi?cation of Idiomatic Expressions

Read the article
problems with unpickling a 80 megabyte file in python

- by tipu

I am using the pickle module to read and write large amounts of data to a file. After writing to the file a 80 megabyte pickled file, I load it in a SocketServer using class MyTCPHandler(SocketServer.BaseRequestHandler): def handle(self): print("in handle") words_file_handler = open('/home/tipu/Dropbox/dev/workspace/search/words.db', 'rb') words = pickle.load(words_file_handler) tweets = shelve.open('/home/tipu/Dropbox/dev/workspace/search/tweets.db', 'r'); results_per_page = 25 query_details = self.request.recv(1024).strip() query_details = eval(query_details) query = query_details["query"] page = int(query_details["page"]) - 1 return_ = [] booleanquery = BooleanQuery(MyTCPHandler.words) if query.find("(") > -1: result = booleanquery.processAdvancedQuery(query) else: result = booleanquery.processQuery(query) result = list(result) i = 0 for tweet_id in result and i < 25: #return_.append(MyTCPHandler.tweets[str(tweet_id)]) return_.append(tweet_id) i += 1 self.request.send(str(return_)) However the file never seems to load after the pickle.load line and it eventually halts the connection attempt. Is there anything I can do to speed this up?

Read the article
Efficient data structure design

- by Sway

Hi there, I need to match a series of user inputed words against a large dictionary of words (to ensure the entered value exists). So if the user entered: "orange" it should match an entry "orange' in the dictionary. Now the catch is that the user can also enter a wildcard or series of wildcard characters like say "or__ge" which would also match "orange" The key requirements are: * this should be as fast as possible. * use the smallest amount of memory to achieve it. If the size of the word list was small I could use a string containing all the words and use regular expressions. however given that the word list could contain potentially hundreds of thousands of enteries I'm assuming this wouldn't work. So is some sort of 'tree' be the way to go for this...? Any thoughts or suggestions on this would be totally appreciated! Thanks in advance, Matt

Read the article
Alternative design for a synonyms table?

- by Majid

I am working on an app which is to suggest alternative words/phrases for input text. I have doubts about what might be a good design for the synonyms table. Design considerations: number of synonyms is variable, i.e. football has one synonym (soccer), but in particular has two (particularly, specifically) if football is a synonym to soccer, the relation exists in the opposite direction as well. our goal is to query a word and find its synonyms we want to keep the table small and make adding new words easy What comes to my mind is a two column design with col a = word and col b = delimited list of synonyms Is there any better alternative? What about using two tables, one for words and the other for relations?

Read the article
combining dynamic text with regular expressions in php

- by pfunc

I am experimenting with finding popular keywords using curl, php and regular expressions. I have an array of non-specific nouns that I am matching my keyword search up. So I am looking for words like "the", "and", "that" etc. and taking them out of the keyword search. so I have an array of words like so: $wordArr = [the, and, at,....]; and then running something like: && preg_match('(\bmyword\w*\b)', $key) == false how do I combine these two so it loops through the array finding out if any of the words in the array match the regular expression? I guess I could just do a for loop, but though maybe I could use in_array($wordArr, $key).. or something like that.

Read the article
python sending incomplete data over socket

- by tipu

I have this socket server script, import SocketServer import shelve import zlib class MyTCPHandler(SocketServer.BaseRequestHandler): def handle(self): self.words = shelve.open('/home/tipu/Dropbox/dev/workspace/search/words.db', 'r'); self.tweets = shelve.open('/home/tipu/Dropbox/dev/workspace/search/tweets.db', 'r'); param = self.request.recv(1024).strip() try: result = str(self.words[param]) except KeyError: result = "set()" self.request.send(str(result)) if __name__ == "__main__": HOST, PORT = "localhost", 50007 SocketServer.TCPServer.allow_reuse_address = True server = SocketServer.TCPServer((HOST, PORT), MyTCPHandler) server.serve_forever() And this receiver, from django.http import HttpResponse from django.template import Context, loader import shelve import zlib import socket def index(req, param = ''): HOST = 'localhost' PORT = 50007 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((HOST, PORT)) s.send(param) data = zlib.decompress(s.recv(131072)) s.close() print 'Received', repr(data) t = loader.get_template('index.html') c = Context({ 'foo' : data }) return HttpResponse(t.render(c)) I am sending strings to the receiver that are in the hundreds of kilobytes. I end up only receiving a portion of it. Is there a way that I can fix that so that the whole string is sent?

Read the article
Writing a post search algorithm.

- by MdaG

I'm trying to write a free text search algorithm for finding specific posts on a wall (similar kind of wall as Facebook uses). A user is suppose to be able to write some words in a search field and get hits on posts that contain the words; with the best match on top and then other posts in decreasing order according to match score. I'm using the edit distance (Levenshtein) "e(x, y) = e" to calculate the score for each post when compared to the query word "x" and post word "y" according to: score(x, y) = 2^(2 - e)(1 - min(e, |x|) / |x|) Each word in a post contributes to the total score for that specific post. This approach seems to work well when the posts are of roughly the same size, but sometime certain large posts manages to rack up score solely on having a lot of words in them while in practice not being relevant to the query. Am I approaching this problem in the wrong way or is there some way to normalize the score that I haven't thought of?

Read the article
regex pattern to match only strings that don't contain spaces PHP

- by Jamex

Hi, I want to match the word/pattern that is contained in the variable, but only match against the words that don't have white spaces. Please give suggestions. $var = 'look'; $array = ('look', 'greatlook', 'lookgreat', 'look great', 'badlook', 'look bad', 'look ', ' look'); matches words: look, greatlook, lookgreat, badlook non matches: look great, bad look, look (trailing space(s)), (space(s)) look. The syntax of the below functions are OK, but it matches everything $match = preg_grep ("/$var/", $array); $match = preg_grep ("/^$var/", $array); (match words with 'look' at the start) but when I include the [^\s], it gives an error $match = preg_grep ("/$var[^\s]/", $array); Parse error: syntax error, unexpected '^', expecting T_STRING or T_VARIABLE TIA

Read the article
js regexp problem

- by Alexander

I have a searching system that splits the keyword into chunks and searches for it in a string like this: var regexp_school = new RegExp("(?=.*" + split_keywords[0] + ")(?=.*" + split_keywords[1] + ")(?=.*" + split_keywords[2] + ").*", "i"); I would like to modify this so that so that I would only search for it in the beginning of the words. For example if the string is: "Bbe be eb ebb beb" And the keyword is: "be eb" Then I want only these to hit "be ebb eb" In other words I want to combine the above regexp with this one: var regexp_school = new RegExp("^" + split_keywords[0], "i"); But I'm not sure how the syntax would look like. I'm also using the split fuction to split the keywords, but I dont want to set a length since I dont know how many words there are in the keyword string. split_keywords = school_keyword.split(" ", 3); If I leave the 3 out, will it have dynamic lenght or just lenght of 1? I tried doing a alert(split_keywords.lenght); But didnt get a desired response

Read the article
Effecient data structure design

- by Sway

Hi there, I need to match a series of user inputed words against a large dictionary of words (to ensure the entered value exists). So if the user entered: "orange" it should match an entry "orange' in the dictionary. Now the catch is that the user can also enter a wildcard or series of wildcard characters like say "or__ge" which would also match "orange" The key requirements are: * this should be as fast as possible. * use the smallest amount of memory to achieve it. If the size of the word list was small I could use a string containing all the words and use regular expressions. however given that the word list could contain potentially hundreds of thousands of enteries I'm assuming this wouldn't work. So is some sort of 'tree' be the way to go for this...? Any thoughts or suggestions on this would be totally appreciated! Thanks in advance, Matt

Read the article
Threads to make video out of images

- by masood

updates: I think/ suspect the imageIO is not thread safe. shared by all threads. the read() call might use resources that are also shared. Thus it will give the performance of a single thread no matter how many threads used. ? if its correct . what is the solution (in practical code) Single request and response model at one time do not utilizes full network/internet bandwidth, thus resulting in low performance. (benchmark is of half speed utilization or even lower) This is to make a video out of an IP cam that gives a new image on each request. http://149.5.43.10:8001/snapshot.jpg It makes a delay of 3 - 8 seconds no matter what I do. Changed thread no. and thread time intervals, debugged the code by System.out.println statements to see if threads work. All seems normal. Any help? Please show some practical code. You may modify mine. This code works (javascript) with much smoother frame rate and max bandwidth usage. but the later code (java) dont. same 3 to 8 seconds gap. <!DOCTYPE html> <html> <head> <script type="text/javascript"> (function(){ var img="/*url*/"; var interval=50; var pointer=0; function showImg(image,idx) { if(idx<=pointer) return; document.body.replaceChild(image,document.getElementsByTagName("img")[0]); pointer=idx; preload(); } function preload() { var cache=null,idx=0;; for(var i=0;i<5;i++) { idx=Date.now()+interval*(i+1); cache=new Image(); cache.onload=(function(ele,idx){return function(){showImg(ele,idx);};})(cache,idx); cache.src=img+"?"+idx; } } window.onload=function(){ document.getElementsByTagName("img")[0].onload=preload; document.getElementsByTagName("img")[0].src="/*initial url*/"; }; })(); </script> </head> <body> <img /> </body> </html> and of java (with problem) : package camba; import java.applet.Applet; import java.awt.Button; import java.awt.Graphics; import java.awt.Image; import java.awt.Label; import java.awt.Panel; import java.awt.TextField; import java.awt.event.ActionEvent; import java.awt.event.ActionListener; import java.net.URL; import java.security.Timestamp; import java.util.Date; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicBoolean; import javax.imageio.ImageIO; public class Camba extends Applet implements ActionListener{ Image img; TextField textField; Label label; Button start,stop; boolean terminate = false; long viewTime; public void init(){ label = new Label("please enter camera URL "); add(label); textField = new TextField(30); add(textField); start = new Button("Start"); add(start); start.addActionListener(this); stop = new Button("Stop"); add(stop); stop.addActionListener(this); } public void actionPerformed(ActionEvent e){ Button source = (Button)e.getSource(); if(source.getLabel() == "Start"){ for (int i = 0; i < 7; i++) { myThread(50*i); } System.out.println("start..."); } if(source.getLabel() == "Stop"){ terminate = true; System.out.println("stop..."); } } public void paint(Graphics g) { update(g); } public void update(Graphics g){ try{ viewTime = System.currentTimeMillis(); g.drawImage(img, 100, 100, this); } catch(Exception e) { e.printStackTrace(); } } public void myThread(final int sleepTime){ new Thread(new Runnable() { public void run() { while(!terminate){ try { TimeUnit.MILLISECONDS.sleep(sleepTime); } catch (InterruptedException ex) { ex.printStackTrace(); } long requestTime= 0; Image tempImage = null; try { URL pic = null; requestTime= System.currentTimeMillis(); pic = new URL(getDocumentBase(), textField.getText()); tempImage = ImageIO.read(pic); } catch(Exception e) { e.printStackTrace(); } if(requestTime >= /*last view time*/viewTime){ img = tempImage; Camba.this.repaint(); } } }}).start(); System.out.println("thread started..."); } }

Read the article
Q on filestream and streamreader unicode

- by habbo95

HI all of u,, I have big problem until now no body helped me. firstly I want to open XXX.vmg (this extension come from Nokia PC Suite) file and read it then write it in richtextbox. I wrote the code there is no error and also there is no reault in the richtextbox here is my code FileStream file = new FileStream("c:\\XXX.vmg", FileMode.OpenOrCreate, FileAccess.Read); StreamWriter sw = new StreamWriter(fileW); StreamReader sr = new StreamReader(file); string s1 = sr.ReadToEnd(); string[] words = s1.Split(' '); for (int i=0; i<words.length; i++) richTextBox1.Text +=Envirment.NewLine + words[i]; The output at richtextbox just blank line

Read the article
Why doesn't Python's `re.split()` split on zero-length matches?

- by Tim Pietzcker

One particular quirk of the (otherwise quite powerful) re module in Python is that re.split() will never split a string on a zero-length match, for example if I want to split a string along word boundaries: >>> re.split(r"\s+|\b", "Split along words, preserve punctuation!") ['Split', 'along', 'words,', 'preserve', 'punctuation!'] instead of ['', 'Split', 'along', 'words', ',', 'preserve', 'punctuation', '!'] Why does it have this limitation? Is it by design? Do other regex flavors behave like this?

Read the article
ruby parametrized regular expression

- by astropanic

I have a string like "{some|words|are|here}" or "{another|set|of|words}" So in general the string consists of an opening curly bracket,words delimited by a pipe and a closing curly bracket. What is the most efficient way to get the selected word of that string ? I would like do something like this: @my_string = "{this|is|a|test|case}" @my_string.get_column(0) # => "this" @my_string.get_column(2) # => "is" @my_string.get_column(4) # => "case" What should the method get_column contain ?

Read the article
C++ Word-Number to int

- by Andrew

I'm developing a program that makes basic calculations using words instead of numbers. E.g. five + two would output seven. The program becomes more complex, taking input such as two_hundred_one + five_thousand_six (201 + 5006) Through operator overloading methods, I split each number and assign it to it's own array index. two would be [0], hundred is [1], and one is [2]. Then the array recycles for 5006. My problem is, to perform the actual calculation, I need to convert the words stored in the array to actual integers. I have const string arrays such as this as a library of the words: const string units[] = { "", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine" }; const string teens[] = { "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", "seventeen", "eighteen", "nineteen" }; const string tens[] = { "", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety" }; If my 'token' array has stored in it two hundred one in index 0, 1, and 2, I'm not sure what the best way to convert these to ints would involve.

Read the article
Complexity in using Binary search and Trie

- by user121196

given a large list of alphabetically sorted words in a file,I need to write a program that, given a word x, determines if x is in the list. Preprocessing is ok since I will be calling this function many times over different inputs. priorties: 1. speed. 2. memory I already know I can use (n is number of words, m is average length of the words) 1. a trie, time is O(log(n)), space(best case) is O(log(n*m)), space(worst case) is O(n*m). 2. load the complete list into memory, then binary search, time is O(log(n)), space is O(n*m) I'm not sure about the complexity on tri, please correct me if they are wrong. Also are there other good approaches?

Read the article
Getting the last element of a Postgres array, declaratively

- by Wojciech Kaczmarek

How to obtain the last element of the array in Postgres? I need to do it declaratively as I want to use it as a ORDER BY criteria. I wouldn't want to create a special PGSQL function for it, the less changes to the database the better in this case. In fact, what I want to do is to sort by the last word of a specific column containing multiple words. Changing the model is not an option here. In other words, I want to push Ruby's sort_by {|x| x.split[-1]} into the database level. I can split a value into array of words with Postgres string_to_array or regexp_split_to_array functions, then how to get its last element?

Read the article
Mysql search design

- by neil

I'm designing a mysql database, and i'd like some input on an efficient way to store blog/article data for searching. Right now, I've made a separate column that stores the content to be searched - no duplicate words, no words shorter than four letters, and no words that are too common. So, essentially, it's a list of keywords from the original article. Also searched would be a list of tags, and the title field. I'm not quite sure how mysql indexes fulltext columns, so would storing the data like that be ineffective, or redundant somehow? A lot of the articles are on the same topic, so would the score be hurt by so many of the rows having similar keywords? Also, for this project, solutions like sphinx, lucene or google custom seach can't be used -- only php & mysql. Thanks!

Read the article
C++ string array from ifstream

- by David Beck

I have a program that I need to read in an array of strings from a file. The array must be C type strings (char * or char[]). Using the following code, I get a bad access error: for (i = 0; i < MAX_WORDS && !inputFile.eof(); i++) { inputFile >> words[i]; } words is declared as: char *words[MAX_WORDS];

Read the article
Java - Can i have a faster performance for this loop ?

- by Brad

I am reading a book and deleting a number of words from it. My problem is that the process takes long time, and i want to make its performance better(Less time), example : Vector<String> pages = new Vector<String>(); // Contains about 1500 page, each page has about 1000 words. Vector<String> wordsToDelete = new Vector<String>(); // Contains about 50000 words. for( String page: pages ) { String pageInLowCase = page.toLowerCase(); for( String wordToDelete: wordsToDelete ) { if( pageInLowCase.contains( wordToDelete ) ) page = page.replaceAll( "(?i)\\b" + wordToDelete + "\\b" , "" ); } // Do some staff with the final page that does not take much time. } This code takes around 3 minutes to execute. If i skipped the loop of replaceAll(...) i can save more than 2 minutes. So is there a way to do the same loop with a faster performance ?

Read the article
autocomplete-like feature with a python dict

- by tipu

In PHP, I had this line matches = preg_grep('/^for/', array_keys($hash)); What it would do is it would grab the words: fork, form etc. that are in $hash. In Python, I have a dict with 400,000 words. It's keys are words I'd like to present in an auto-complete like feature (the values in this case are meaningless). How would I be able to return the keys from my dictionary that match the input? For example (as used earlier), if I have my_dic = t{"fork" : True, "form" : True, "fold" : True, "fame" : True} and I get some input "for", It'll return a list of "fork", "form", "fold"

Read the article
Why does this MSDN example for Func<> delegate have a superfluous Select() call?

- by Dan

The MSDN gives this code example in the article on the Func Generic Delegate: Func<String, int, bool> predicate = ( str, index) => str.Length == index; String[] words = { "orange", "apple", "Article", "elephant", "star", "and" }; IEnumerable<String> aWords = words.Where(predicate).Select(str => str); foreach (String word in aWords) Console.WriteLine(word); I understand what all this is doing. What I don't understand is the Select(str => str) bit. Surely that's not needed? If you leave it out and just have IEnumerable<String> aWords = words.Where(predicate); then you still get an IEnumerable back that contains the same results, and the code prints the same thing. Am I missing something, or is the example misleading?

Read the article
fastest way to perform string search in general and in python

- by Rkz

My task is to search for a string or a pattern in a list of documents that are very short (say 200 characters long). However, say there are 1 million documents of such time. What is the most efficient way to perform this search?. I was thinking of tokenizing each document and putting the words in hashtable with words as key and document number as value, there by creating a bag of words. Then perform the word search and retrieve the list of documents that contained this word. From what I can see is this operation will take O(n) operations. Is there any other way? may be without using hash-tables?. Also, is there a python library or third party package that can perform efficient searches?

Read the article
Java-Counting occurrence of word from huge textfile

- by Naveen

I have a text file of size 115MB. It consists of about 20 million words. I have to use the file as a word collection, and use it to search the occurrence of each user-given words from the collection. I am using this process as a small part in my project. I need a method for finding out the number of occurrence of given words in a faster and correct manner since i may use it in iterations. I am in need of suggestion about any API that i can make use or some other way that performs the task in a quicker manner. Any recommendations are appreciated.

Read the article
How to get my checking MIME type script to work? (PHP)

- by ggfan

For this script, it checks to see if the file is a microsoft words doc or ppt. I am not sure why this isn't running because it works for image MIME and text/plain. I am using PHP 5.3.1 so it should have all the MIME types installed already right? I am uploading words and powerpoint 2007. //Does the file have the right MIME type? if ($_FILES['userfile']['type'] !='application/msword') { echo 'Problem: file is not words doc.'; exit; }

Read the article

< Previous Page | 81 82 83 84 85 86 87 88 89 90 91 92 | Next Page >