Search Results

Search found 6107 results on 245 pages for 'reserved words'.

Page 30/245 | < Previous Page | 26 27 28 29 30 31 32 33 34 35 36 37 | Next Page >

problems with unpickling a 80 megabyte file in python

- by tipu

I am using the pickle module to read and write large amounts of data to a file. After writing to the file a 80 megabyte pickled file, I load it in a SocketServer using class MyTCPHandler(SocketServer.BaseRequestHandler): def handle(self): print("in handle") words_file_handler = open('/home/tipu/Dropbox/dev/workspace/search/words.db', 'rb') words = pickle.load(words_file_handler) tweets = shelve.open('/home/tipu/Dropbox/dev/workspace/search/tweets.db', 'r'); results_per_page = 25 query_details = self.request.recv(1024).strip() query_details = eval(query_details) query = query_details["query"] page = int(query_details["page"]) - 1 return_ = [] booleanquery = BooleanQuery(MyTCPHandler.words) if query.find("(") > -1: result = booleanquery.processAdvancedQuery(query) else: result = booleanquery.processQuery(query) result = list(result) i = 0 for tweet_id in result and i < 25: #return_.append(MyTCPHandler.tweets[str(tweet_id)]) return_.append(tweet_id) i += 1 self.request.send(str(return_)) However the file never seems to load after the pickle.load line and it eventually halts the connection attempt. Is there anything I can do to speed this up?

Read the article
Alternative design for a synonyms table?

- by Majid

I am working on an app which is to suggest alternative words/phrases for input text. I have doubts about what might be a good design for the synonyms table. Design considerations: number of synonyms is variable, i.e. football has one synonym (soccer), but in particular has two (particularly, specifically) if football is a synonym to soccer, the relation exists in the opposite direction as well. our goal is to query a word and find its synonyms we want to keep the table small and make adding new words easy What comes to my mind is a two column design with col a = word and col b = delimited list of synonyms Is there any better alternative? What about using two tables, one for words and the other for relations?

Read the article
Efficient data structure design

- by Sway

Hi there, I need to match a series of user inputed words against a large dictionary of words (to ensure the entered value exists). So if the user entered: "orange" it should match an entry "orange' in the dictionary. Now the catch is that the user can also enter a wildcard or series of wildcard characters like say "or__ge" which would also match "orange" The key requirements are: * this should be as fast as possible. * use the smallest amount of memory to achieve it. If the size of the word list was small I could use a string containing all the words and use regular expressions. however given that the word list could contain potentially hundreds of thousands of enteries I'm assuming this wouldn't work. So is some sort of 'tree' be the way to go for this...? Any thoughts or suggestions on this would be totally appreciated! Thanks in advance, Matt

Read the article
python sending incomplete data over socket

- by tipu

I have this socket server script, import SocketServer import shelve import zlib class MyTCPHandler(SocketServer.BaseRequestHandler): def handle(self): self.words = shelve.open('/home/tipu/Dropbox/dev/workspace/search/words.db', 'r'); self.tweets = shelve.open('/home/tipu/Dropbox/dev/workspace/search/tweets.db', 'r'); param = self.request.recv(1024).strip() try: result = str(self.words[param]) except KeyError: result = "set()" self.request.send(str(result)) if __name__ == "__main__": HOST, PORT = "localhost", 50007 SocketServer.TCPServer.allow_reuse_address = True server = SocketServer.TCPServer((HOST, PORT), MyTCPHandler) server.serve_forever() And this receiver, from django.http import HttpResponse from django.template import Context, loader import shelve import zlib import socket def index(req, param = ''): HOST = 'localhost' PORT = 50007 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((HOST, PORT)) s.send(param) data = zlib.decompress(s.recv(131072)) s.close() print 'Received', repr(data) t = loader.get_template('index.html') c = Context({ 'foo' : data }) return HttpResponse(t.render(c)) I am sending strings to the receiver that are in the hundreds of kilobytes. I end up only receiving a portion of it. Is there a way that I can fix that so that the whole string is sent?

Read the article
combining dynamic text with regular expressions in php

- by pfunc

I am experimenting with finding popular keywords using curl, php and regular expressions. I have an array of non-specific nouns that I am matching my keyword search up. So I am looking for words like "the", "and", "that" etc. and taking them out of the keyword search. so I have an array of words like so: $wordArr = [the, and, at,....]; and then running something like: && preg_match('(\bmyword\w*\b)', $key) == false how do I combine these two so it loops through the array finding out if any of the words in the array match the regular expression? I guess I could just do a for loop, but though maybe I could use in_array($wordArr, $key).. or something like that.

Read the article
Writing a post search algorithm.

- by MdaG

I'm trying to write a free text search algorithm for finding specific posts on a wall (similar kind of wall as Facebook uses). A user is suppose to be able to write some words in a search field and get hits on posts that contain the words; with the best match on top and then other posts in decreasing order according to match score. I'm using the edit distance (Levenshtein) "e(x, y) = e" to calculate the score for each post when compared to the query word "x" and post word "y" according to: score(x, y) = 2^(2 - e)(1 - min(e, |x|) / |x|) Each word in a post contributes to the total score for that specific post. This approach seems to work well when the posts are of roughly the same size, but sometime certain large posts manages to rack up score solely on having a lot of words in them while in practice not being relevant to the query. Am I approaching this problem in the wrong way or is there some way to normalize the score that I haven't thought of?

Read the article
regex pattern to match only strings that don't contain spaces PHP

- by Jamex

Hi, I want to match the word/pattern that is contained in the variable, but only match against the words that don't have white spaces. Please give suggestions. $var = 'look'; $array = ('look', 'greatlook', 'lookgreat', 'look great', 'badlook', 'look bad', 'look ', ' look'); matches words: look, greatlook, lookgreat, badlook non matches: look great, bad look, look (trailing space(s)), (space(s)) look. The syntax of the below functions are OK, but it matches everything $match = preg_grep ("/$var/", $array); $match = preg_grep ("/^$var/", $array); (match words with 'look' at the start) but when I include the [^\s], it gives an error $match = preg_grep ("/$var[^\s]/", $array); Parse error: syntax error, unexpected '^', expecting T_STRING or T_VARIABLE TIA

Read the article
js regexp problem

- by Alexander

I have a searching system that splits the keyword into chunks and searches for it in a string like this: var regexp_school = new RegExp("(?=.*" + split_keywords[0] + ")(?=.*" + split_keywords[1] + ")(?=.*" + split_keywords[2] + ").*", "i"); I would like to modify this so that so that I would only search for it in the beginning of the words. For example if the string is: "Bbe be eb ebb beb" And the keyword is: "be eb" Then I want only these to hit "be ebb eb" In other words I want to combine the above regexp with this one: var regexp_school = new RegExp("^" + split_keywords[0], "i"); But I'm not sure how the syntax would look like. I'm also using the split fuction to split the keywords, but I dont want to set a length since I dont know how many words there are in the keyword string. split_keywords = school_keyword.split(" ", 3); If I leave the 3 out, will it have dynamic lenght or just lenght of 1? I tried doing a alert(split_keywords.lenght); But didnt get a desired response

Read the article
Effecient data structure design

- by Sway

Hi there, I need to match a series of user inputed words against a large dictionary of words (to ensure the entered value exists). So if the user entered: "orange" it should match an entry "orange' in the dictionary. Now the catch is that the user can also enter a wildcard or series of wildcard characters like say "or__ge" which would also match "orange" The key requirements are: * this should be as fast as possible. * use the smallest amount of memory to achieve it. If the size of the word list was small I could use a string containing all the words and use regular expressions. however given that the word list could contain potentially hundreds of thousands of enteries I'm assuming this wouldn't work. So is some sort of 'tree' be the way to go for this...? Any thoughts or suggestions on this would be totally appreciated! Thanks in advance, Matt

Read the article
Q on filestream and streamreader unicode

- by habbo95

HI all of u,, I have big problem until now no body helped me. firstly I want to open XXX.vmg (this extension come from Nokia PC Suite) file and read it then write it in richtextbox. I wrote the code there is no error and also there is no reault in the richtextbox here is my code FileStream file = new FileStream("c:\\XXX.vmg", FileMode.OpenOrCreate, FileAccess.Read); StreamWriter sw = new StreamWriter(fileW); StreamReader sr = new StreamReader(file); string s1 = sr.ReadToEnd(); string[] words = s1.Split(' '); for (int i=0; i<words.length; i++) richTextBox1.Text +=Envirment.NewLine + words[i]; The output at richtextbox just blank line

Read the article
Why doesn't Python's `re.split()` split on zero-length matches?

- by Tim Pietzcker

One particular quirk of the (otherwise quite powerful) re module in Python is that re.split() will never split a string on a zero-length match, for example if I want to split a string along word boundaries: >>> re.split(r"\s+|\b", "Split along words, preserve punctuation!") ['Split', 'along', 'words,', 'preserve', 'punctuation!'] instead of ['', 'Split', 'along', 'words', ',', 'preserve', 'punctuation', '!'] Why does it have this limitation? Is it by design? Do other regex flavors behave like this?

Read the article
ruby parametrized regular expression

- by astropanic

I have a string like "{some|words|are|here}" or "{another|set|of|words}" So in general the string consists of an opening curly bracket,words delimited by a pipe and a closing curly bracket. What is the most efficient way to get the selected word of that string ? I would like do something like this: @my_string = "{this|is|a|test|case}" @my_string.get_column(0) # => "this" @my_string.get_column(2) # => "is" @my_string.get_column(4) # => "case" What should the method get_column contain ?

Read the article
C++ Word-Number to int

- by Andrew

I'm developing a program that makes basic calculations using words instead of numbers. E.g. five + two would output seven. The program becomes more complex, taking input such as two_hundred_one + five_thousand_six (201 + 5006) Through operator overloading methods, I split each number and assign it to it's own array index. two would be [0], hundred is [1], and one is [2]. Then the array recycles for 5006. My problem is, to perform the actual calculation, I need to convert the words stored in the array to actual integers. I have const string arrays such as this as a library of the words: const string units[] = { "", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine" }; const string teens[] = { "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", "seventeen", "eighteen", "nineteen" }; const string tens[] = { "", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety" }; If my 'token' array has stored in it two hundred one in index 0, 1, and 2, I'm not sure what the best way to convert these to ints would involve.

Read the article
Complexity in using Binary search and Trie

- by user121196

given a large list of alphabetically sorted words in a file,I need to write a program that, given a word x, determines if x is in the list. Preprocessing is ok since I will be calling this function many times over different inputs. priorties: 1. speed. 2. memory I already know I can use (n is number of words, m is average length of the words) 1. a trie, time is O(log(n)), space(best case) is O(log(n*m)), space(worst case) is O(n*m). 2. load the complete list into memory, then binary search, time is O(log(n)), space is O(n*m) I'm not sure about the complexity on tri, please correct me if they are wrong. Also are there other good approaches?

Read the article
Getting the last element of a Postgres array, declaratively

- by Wojciech Kaczmarek

How to obtain the last element of the array in Postgres? I need to do it declaratively as I want to use it as a ORDER BY criteria. I wouldn't want to create a special PGSQL function for it, the less changes to the database the better in this case. In fact, what I want to do is to sort by the last word of a specific column containing multiple words. Changing the model is not an option here. In other words, I want to push Ruby's sort_by {|x| x.split[-1]} into the database level. I can split a value into array of words with Postgres string_to_array or regexp_split_to_array functions, then how to get its last element?

Read the article
Mysql search design

- by neil

I'm designing a mysql database, and i'd like some input on an efficient way to store blog/article data for searching. Right now, I've made a separate column that stores the content to be searched - no duplicate words, no words shorter than four letters, and no words that are too common. So, essentially, it's a list of keywords from the original article. Also searched would be a list of tags, and the title field. I'm not quite sure how mysql indexes fulltext columns, so would storing the data like that be ineffective, or redundant somehow? A lot of the articles are on the same topic, so would the score be hurt by so many of the rows having similar keywords? Also, for this project, solutions like sphinx, lucene or google custom seach can't be used -- only php & mysql. Thanks!

Read the article
C++ string array from ifstream

- by David Beck

I have a program that I need to read in an array of strings from a file. The array must be C type strings (char * or char[]). Using the following code, I get a bad access error: for (i = 0; i < MAX_WORDS && !inputFile.eof(); i++) { inputFile >> words[i]; } words is declared as: char *words[MAX_WORDS];

Read the article
autocomplete-like feature with a python dict

- by tipu

In PHP, I had this line matches = preg_grep('/^for/', array_keys($hash)); What it would do is it would grab the words: fork, form etc. that are in $hash. In Python, I have a dict with 400,000 words. It's keys are words I'd like to present in an auto-complete like feature (the values in this case are meaningless). How would I be able to return the keys from my dictionary that match the input? For example (as used earlier), if I have my_dic = t{"fork" : True, "form" : True, "fold" : True, "fame" : True} and I get some input "for", It'll return a list of "fork", "form", "fold"

Read the article
Java - Can i have a faster performance for this loop ?

- by Brad

I am reading a book and deleting a number of words from it. My problem is that the process takes long time, and i want to make its performance better(Less time), example : Vector<String> pages = new Vector<String>(); // Contains about 1500 page, each page has about 1000 words. Vector<String> wordsToDelete = new Vector<String>(); // Contains about 50000 words. for( String page: pages ) { String pageInLowCase = page.toLowerCase(); for( String wordToDelete: wordsToDelete ) { if( pageInLowCase.contains( wordToDelete ) ) page = page.replaceAll( "(?i)\\b" + wordToDelete + "\\b" , "" ); } // Do some staff with the final page that does not take much time. } This code takes around 3 minutes to execute. If i skipped the loop of replaceAll(...) i can save more than 2 minutes. So is there a way to do the same loop with a faster performance ?

Read the article
Why does this MSDN example for Func<> delegate have a superfluous Select() call?

- by Dan

The MSDN gives this code example in the article on the Func Generic Delegate: Func<String, int, bool> predicate = ( str, index) => str.Length == index; String[] words = { "orange", "apple", "Article", "elephant", "star", "and" }; IEnumerable<String> aWords = words.Where(predicate).Select(str => str); foreach (String word in aWords) Console.WriteLine(word); I understand what all this is doing. What I don't understand is the Select(str => str) bit. Surely that's not needed? If you leave it out and just have IEnumerable<String> aWords = words.Where(predicate); then you still get an IEnumerable back that contains the same results, and the code prints the same thing. Am I missing something, or is the example misleading?

Read the article
fastest way to perform string search in general and in python

- by Rkz

My task is to search for a string or a pattern in a list of documents that are very short (say 200 characters long). However, say there are 1 million documents of such time. What is the most efficient way to perform this search?. I was thinking of tokenizing each document and putting the words in hashtable with words as key and document number as value, there by creating a bag of words. Then perform the word search and retrieve the list of documents that contained this word. From what I can see is this operation will take O(n) operations. Is there any other way? may be without using hash-tables?. Also, is there a python library or third party package that can perform efficient searches?

Read the article
How to get my checking MIME type script to work? (PHP)

- by ggfan

For this script, it checks to see if the file is a microsoft words doc or ppt. I am not sure why this isn't running because it works for image MIME and text/plain. I am using PHP 5.3.1 so it should have all the MIME types installed already right? I am uploading words and powerpoint 2007. //Does the file have the right MIME type? if ($_FILES['userfile']['type'] !='application/msword') { echo 'Problem: file is not words doc.'; exit; }

Read the article
Java-Counting occurrence of word from huge textfile

- by Naveen

I have a text file of size 115MB. It consists of about 20 million words. I have to use the file as a word collection, and use it to search the occurrence of each user-given words from the collection. I am using this process as a small part in my project. I need a method for finding out the number of occurrence of given words in a faster and correct manner since i may use it in iterations. I am in need of suggestion about any API that i can make use or some other way that performs the task in a quicker manner. Any recommendations are appreciated.

Read the article
Solaris ldap Authentication

- by Tman

Hi everyone Iv been having a trouble trying to get my Solaris 10 server to authenticate against an eDir server.im managed to Set up my linux(RHeL,SLES) servers to authenticate against the ldap Server.which works fine. Here is my configuration Files. ldapclient list: NS_LDAP_FILE_VERSION= 2.0 NS_LDAP_BINDDN= cn=proxyuser,o=AEDev NS_LDAP_BINDPASSWD= {NS1}ecfa88f3a945c22222233 NS_LDAP_SERVERS= 192.168.0.19 NS_LDAP_SEARCH_BASEDN= ou=auth,o=AEDev NS_LDAP_AUTH= simple NS_LDAP_SEARCH_SCOPE= sub NS_LDAP_CACHETTL= 0 NS_LDAP_CREDENTIAL_LEVEL= anonymous NS_LDAP_SERVICE_SEARCH_DESC= group:ou=Groups,ou=auth,o=AEDev NS_LDAP_SERVICE_SEARCH_DESC= shadow:ou=users,ou=auth,o=AEDev?sub?objectClass=shadowAccount NS_LDAP_SERVICE_SEARCH_DESC= passwd:ou=auth,o=AEDev?sub?objectClass=posixAccount NS_LDAP_BIND_TIME= 10 NS_LDAP_SERVICE_AUTH_METHOD= pam_ldap:simple getent passwd works fine: root:x:0:0:Super-User:/:/sbin/sh daemon:x:1:1::/: bin:x:2:2::/usr/bin: sys:x:3:3::/: adm:x:4:4:Admin:/var/adm: lp:x:71:8:Line Printer Admin:/usr/spool/lp: uucp:x:5:5:uucp Admin:/usr/lib/uucp: nuucp:x:9:9:uucp Admin:/var/spool/uucppublic:/usr/lib/uucp/uucico smmsp:x:25:25:SendMail Message Submission Program:/: listen:x:37:4:Network Admin:/usr/net/nls: gdm:x:50:50:GDM Reserved UID:/: webservd:x:80:80:WebServer Reserved UID:/: postgres:x:90:90:PostgreSQL Reserved UID:/:/usr/bin/pfksh svctag:x:95:12:Service Tag UID:/: nobody:x:60001:60001:NFS Anonymous Access User:/: noaccess:x:60002:60002:No Access User:/: nobody4:x:65534:65534:SunOS 4.x NFS Anonymous Access User:/: tlla:x:2012:100::/home/tlla: test:x:2011:100::/home/test: thato:x:2010:100::/home/thato: pam.conf login auth sufficient pam_unix_auth.so.1 #server_policy login auth sufficient /usr/lib/security/pam_ldap.so.1 try_first_pass login auth required pam_dial_auth.so.1 rlogin auth sufficient pam_rhosts_auth.so.1 rlogin auth requisite pam_authtok_get.so.1 rlogin auth required pam_dhkeys.so.1 rlogin auth required pam_unix_cred.so.1 rlogin auth sufficient pam_unix_auth.so.1 rlogin auth sufficient /usr/lib/security/pam_ldap.so.1 try_first_pass rsh auth sufficient pam_rhosts_auth.so.1 rsh auth required pam_unix_cred.so.1 rsh auth sufficient pam_unix_auth.so.1 #server_policy rsh auth sufficient /usr/lib/security/pam_ldap.so.1 try_first_pass other auth requisite pam_authtok_get.so.1 other auth required pam_dhkeys.so.1 other auth required pam_unix_cred.so.1 other auth sufficient pam_unix_auth.so.1 other auth sufficient /usr/lib/security/pam_ldap.so.1 try_first_pass passwd auth required pam_passwd_auth.so.1 passwd auth sufficient pam_unix_auth.so.1 ssh account sufficient pam_unix.so.1 ssh account sufficient /usr/lib/security/pam_ldap.so.1 try_first_pass other account requisite pam_roles.so.1 other account sufficient pam_unix_account.so.1 other account sufficient /usr/lib/security/pam_ldap.so.1 try_first_pass other password required pam_dhkeys.so.1 other password requisite pam_authtok_get.so.1 other password requisite pam_authtok_check.so.1 other password required pam_authtok_store.so.1 other password sufficient pam_unix.so.1 other password sufficient /usr/lib/security/pam_ldap.so.1 try_first_pass Local Authentication Works But LDAP Authentication Doesn't Work.

Read the article
How to install windows 7 from scratch on a disk which already contains partitions

- by rangalo

Hi, I have following partitions on a 1 TB disk. 14 GB UNKNOWN recovery partition 100MB NTFS System Reserved partition for Windows 7 448GB NTFS Windows 7 system partition 468GB NTFS Data partition for windows 7 Now because of the problems mentioned in my other question here I got a brand new windows 7 cd and want to install it from scratch after deleting all the extra partitions. But windows 7 installation doesn't give me such options. It refuses to touch the 14GB Recovery and 100 MB (reserved by previous windows 7) partition. Any ideas ? Note: Because of it is a dynamic disks most of the freely available tools refuse to delete the partitions on the disk. regards.

Read the article

< Previous Page | 26 27 28 29 30 31 32 33 34 35 36 37 | Next Page >