Search Results

Search found 13602 results on 545 pages for 'python decorators'.

Page 159/545 | < Previous Page | 155 156 157 158 159 160 161 162 163 164 165 166  | Next Page >

  • Efficient file buffering & scanning methods for large files in python

    - by eblume
    The description of the problem I am having is a bit complicated, and I will err on the side of providing more complete information. For the impatient, here is the briefest way I can summarize it: What is the fastest (least execution time) way to split a text file in to ALL (overlapping) substrings of size N (bound N, eg 36) while throwing out newline characters. I am writing a module which parses files in the FASTA ascii-based genome format. These files comprise what is known as the 'hg18' human reference genome, which you can download from the UCSC genome browser (go slugs!) if you like. As you will notice, the genome files are composed of chr[1..22].fa and chr[XY].fa, as well as a set of other small files which are not used in this module. Several modules already exist for parsing FASTA files, such as BioPython's SeqIO. (Sorry, I'd post a link, but I don't have the points to do so yet.) Unfortunately, every module I've been able to find doesn't do the specific operation I am trying to do. My module needs to split the genome data ('CAGTACGTCAGACTATACGGAGCTA' could be a line, for instance) in to every single overlapping N-length substring. Let me give an example using a very small file (the actual chromosome files are between 355 and 20 million characters long) and N=8 import cStringIO example_file = cStringIO.StringIO("""\ header CAGTcag TFgcACF """) for read in parse(example_file): ... print read ... CAGTCAGTF AGTCAGTFG GTCAGTFGC TCAGTFGCA CAGTFGCAC AGTFGCACF The function that I found had the absolute best performance from the methods I could think of is this: def parse(file): size = 8 # of course in my code this is a function argument file.readline() # skip past the header buffer = '' for line in file: buffer += line.rstrip().upper() while len(buffer) = size: yield buffer[:size] buffer = buffer[1:] This works, but unfortunately it still takes about 1.5 hours (see note below) to parse the human genome this way. Perhaps this is the very best I am going to see with this method (a complete code refactor might be in order, but I'd like to avoid it as this approach has some very specific advantages in other areas of the code), but I thought I would turn this over to the community. Thanks! Note, this time includes a lot of extra calculation, such as computing the opposing strand read and doing hashtable lookups on a hash of approximately 5G in size. Post-answer conclusion: It turns out that using fileobj.read() and then manipulating the resulting string (string.replace(), etc.) took relatively little time and memory compared to the remainder of the program, and so I used that approach. Thanks everyone!

    Read the article

  • Sorting Python list based on the length of the string

    - by prosseek
    I want to sort a list of strings based on the string length. I tried to use sort as follows, but it doesn't seem to give me correct result. xs = ['dddd','a','bb','ccc'] print xs xs.sort(lambda x,y: len(x) < len(y)) print xs ['dddd', 'a', 'bb', 'ccc'] ['dddd', 'a', 'bb', 'ccc'] What might be wrong?

    Read the article

  • How to print a dictionary in python c api function

    - by dizgam
    PyObject* dict = PyDict_New(); PyDict_SetItem(dict, key, value); PyDict_GetItem(dict, key); Bus error if i use getitem function otherwise not. So Want to confirm that the dictionary has the same values which i have set. Other than using PyDict_GetItem function, Is there any other method to print the values of the dictionary?

    Read the article

  • Organizing a random list of objects in Python.

    - by Saebin
    So I have a list that I want to convert to a list that contains a list for each group of objects. ie ['objA.attr1', 'objC', 'objA.attr55', 'objB.attr4'] would return [['objA.attr1', 'objA.attr55'], ['objC'], ['objB.attr4']] currently this is what I use: givenList = ['a.attr1', 'b', 'a.attr55', 'c.attr4'] trgList = [] objNames = [] for val in givenList: obj = val.split('.')[0] if obj in objNames: id = objNames.index(obj) trgList[id].append(val) else: objNames.append(obj) trgList.append([val]) #print trgList It seems to run a decent speed when the original list has around 100,000 ids... but I am curious if there is a better way to do this. Order of the objects or attributes does not matter. Any ideas?

    Read the article

  • Exporting dates properly formatted on Google Appengine in Python

    - by Chris M
    I think this is right but google appengine seems to get to a certain point and cop-out; Firstly is this code actually right; and secondly is there away to skip the record if it cant output (like an ignore errors and continue)? class TrackerExporter(bulkloader.Exporter): def __init__(self): bulkloader.Exporter.__init__(self, 'SearchRec', [('__key__', lambda key:key.name(), None), ('WebSite', str, None), ('DateStamp', lambda x: datetime.datetime.strptime(x, '%d-%m-%Y').date(), None), ('IP', str, None), ('UserAgent', str, None)]) Thanks

    Read the article

  • Python module being reloaded for each request with django and mod_wsgi

    - by Vishal
    I have a variable in init of a module which get loaded from the database and takes about 15 seconds. For django development server everything is working fine but looks like with apache2 and mod_wsgi the module is loaded with every request (taking 15 seconds). Any idea about this behavior? Update: I have enabled daemon mode in mod wsgi, looks like its not reloading the modules now! needs more testing and I will update.

    Read the article

  • how to send some data to the Thread module on python and google-map-engine

    - by zjm1126
    from google.appengine.ext import db class Log(db.Model): content = db.StringProperty(multiline=True) class MyThread(threading.Thread): def run(self,request): #logs_query = Log.all().order('-date') #logs = logs_query.fetch(3) log=Log() log.content=request.POST.get('content',None) log.put() def Log(request): thr = MyThread() thr.start(request) return HttpResponse('') error is : Exception in thread Thread-1: Traceback (most recent call last): File "D:\Python25\lib\threading.py", line 486, in __bootstrap_inner self.run() File "D:\zjm_code\helloworld\views.py", line 33, in run log.content=request.POST.get('content',None) NameError: global name 'request' is not defined

    Read the article

  • Python and classes

    - by Artyom
    Hello, i have 2 classes. How i call first.TQ in Second ? Without creating object First in Second. class First: def __init__(self): self.str = "" def TQ(self): pass def main(self): T = Second(self.str) # Called here class Second(): def __init__(self): list = {u"RANDINT":first.TQ} # List of funcs maybe called in first ..... ..... return data

    Read the article

  • python: problem with dictionary get method default value

    - by goutham
    I'm having a new problem here .. CODE 1: try: urlParams += "%s=%s&"%(val['name'], data.get(val['name'], serverInfo_D.get(val['name']))) except KeyError: print "expected parameter not provided - "+val["name"]+" is missing" exit(0) CODE 2: try: urlParams += "%s=%s&"%(val['name'], data.get(val['name'], serverInfo_D[val['name']])) except KeyError: print "expected parameter not provided - "+val["name"]+" is missing" exit(0) see the diffrence in serverInfo_D[val['name']] & serverInfo_D.get(val['name']) code 2 fails but code 1 works the data serverInfo_D:{'user': 'usr', 'pass': 'pass'} data: {'par1': 9995, 'extraparam1': 22} val: {'par1','user','pass','extraparam1'} exception are raised for for data dict .. and all code in for loop which iterates over val

    Read the article

  • I want to design a html form in python

    - by VaIbHaV-JaIn
    when user will enter details in the text box on the html from <h1>Please enter new password</h1> <form method="POST" enctype="application/json action="uid"> Password<input name="passwd"type="password" /><br> Retype Password<input name="repasswd" type="password" /><br> <input type="Submit" /> </form> </body> i want to post the data in json format through http post request and also i want to set content-type = application/json

    Read the article

  • Rectangle Rotation in Python/Pygame

    - by mramazingguy
    Hey I'm trying to rotate a rectangle around its center and when I try to rotate the rectangle, it moves up and to the left at the same time. Does anyone have any ideas on how to fix this? def rotatePoint(self, angle, point, origin): sinT = sin(radians(angle)) cosT = cos(radians(angle)) return (origin[0] + (cosT * (point[0] - origin[0]) - sinT * (point[1] - origin[1])), origin[1] + (sinT * (point[0] - origin[0]) + cosT * (point[1] - origin[1]))) def rotateRect(self, degrees): center = (self.collideRect.centerx, self.collideRect.centery) self.collideRect.topleft = self.rotatePoint(degrees, self.collideRect.topleft, center) self.collideRect.topright = self.rotatePoint(degrees, self.collideRect.topright, center) self.collideRect.bottomleft = self.rotatePoint(degrees, self.collideRect.bottomleft, center) self.collideRect.bottomright = self.rotatePoint(degrees, self.collideRect.bottomright, center)

    Read the article

  • Implement loops for python 3

    - by Alex
    Implement this loop: total up the product of the numbers from 1 to x. Implement this loop: total up the product of the numbers from a to b. Implement this loop: total up the sum of the numbers from a to b. Implement this loop: total up the sum of the numbers from 1 to x. Implement this loop: count the number of characters in a string s. i'm very lost on implementing loops these are just some examples that i am having trouble with-- if someone could help me understand how to do them that would be awesome

    Read the article

  • Python string formatting too slow

    - by wich
    I use the following code to log a map, it is fast when it only contains zeroes, but as soon as there is actual data in the map it becomes unbearably slow... Is there any way to do this faster? log_file = open('testfile', 'w') for i, x in ((i, start + i * interval) for i in range(length)): log_file.write('%-5d %8.3f %13g %13g %13g %13g %13g %13g\n' % (i, x, map[0][i], map[1][i], map[2][i], map[3][i], map[4][i], map[5][i]))

    Read the article

  • filtering elements from list of lists in Python?

    - by user248237
    I want to filter elements from a list of lists, and iterate over the elements of each element using a lambda. For example, given the list: a = [[1,2,3],[4,5,6]] suppose that I want to keep only elements where the sum of the list is greater than N. I tried writing: filter(lambda x, y, z: x + y + z >= N, a) but I get the error: <lambda>() takes exactly 3 arguments (1 given) How can I iterate while assigning values of each element to x, y, and z? Something like zip, but for arbitrarily long lists. thanks, p.s. I know I can write this using: filter(lambda x: sum(x)..., a) but that's not the point, imagine that these were not numbers but arbitrary elements and I wanted to assign their values to variable names.

    Read the article

  • Extract anything that looks like links from large amount of data in python

    - by Riz
    Hi, I have around 5 GB of html data which I want to process to find links to a set of websites and perform some additional filtering. Right now I use simple regexp for each site and iterate over them, searching for matches. In my case links can be outside of "a" tags and be not well formed in many ways(like "\n" in the middle of link) so I try to grab as much "links" as I can and check them later in other scripts(so no BeatifulSoup\lxml\etc). The problem is that my script is pretty slow, so I am thinking about any ways to speed it up. I am writing a set of test to check different approaches, but hope to get some advices :) Right now I am thinking about getting all links without filtering first(maybe using C module or standalone app, which doesn't use regexp but simple search to get start and end of every link) and then using regexp to match ones I need.

    Read the article

  • python feedparser with yahoo weather rss

    - by mudder
    I'm trying to use feedparser to get some data from yahoos weather rss. It looks like feed parser strips out the yweather namespace data: http://weather.yahooapis.com/forecastrss?w=24260013&u=c <yweather:condition text="Fair" code="34" temp="23" date="Wed, 19 May 2010 5:55 pm EDT" /> looks like feedparser is completely ignoring that. is there away to get it?

    Read the article

  • Dynamically adding @property in python

    - by rz
    I know that I can dynamically add an instance method to an object by doing something like: import types def my_method(self): # logic of method # ... # instance is some instance of some class instance.my_method = types.MethodType(my_method, instance) Later on I can call instance.my_method() and self will be bound correctly and everything works. Now, my question: how to do the exact same thing to obtain the behavior that decorating the new method with @property would give? I would guess something like: instance.my_method = types.MethodType(my_method, instance) instance.my_method = property(instance.my_method) But, doing that instance.my_method returns a property object.

    Read the article

  • Python 3.1 - Memory Error during sampling of a large list

    - by jimy
    The input list can be more than 1 million numbers. When I run the following code with smaller 'repeats', its fine; def sample(x): length = 1000000 new_array = random.sample((list(x)),length) return (new_array) def repeat_sample(x): i = 0 repeats = 100 list_of_samples = [] for i in range(repeats): list_of_samples.append(sample(x)) return(list_of_samples) repeat_sample(large_array) However, using high repeats such as the 100 above, results in MemoryError. Traceback is as follows; Traceback (most recent call last): File "C:\Python31\rnd.py", line 221, in <module> STORED_REPEAT_SAMPLE = repeat_sample(STORED_ARRAY) File "C:\Python31\rnd.py", line 129, in repeat_sample list_of_samples.append(sample(x)) File "C:\Python31\rnd.py", line 121, in sample new_array = random.sample((list(x)),length) File "C:\Python31\lib\random.py", line 309, in sample result = [None] * k MemoryError I am assuming I'm running out of memory. I do not know how to get around this problem. Thank you for your time!

    Read the article

  • Efficient way in Python to remove an element from a comma-separated string

    - by ensnare
    I'm looking for the most efficient way to add an element to a comma-separated string while maintaining alphabetical order for the words: For example: string = 'Apples, Bananas, Grapes, Oranges' subtraction = 'Bananas' result = 'Apples, Grapes, Oranges' Also, a way to do this but while maintaining IDs: string = '1:Apples, 4:Bananas, 6:Grapes, 23:Oranges' subtraction = '4:Bananas' result = '1:Apples, 6:Grapes, 23:Oranges' Sample code is greatly appreciated. Thank you so much.

    Read the article

  • Python: find <title>

    - by Peter
    I have this: response = urllib2.urlopen(url) html = response.read() begin = html.find('<title>') end = html.find('</title>',begin) title = html[begin+len('<title>'):end].strip() if the url = http://www.google.com then the title have no problem as "Google", but if the url = "http://www.britishcouncil.org/learning-english-gateway" then the title become "<!doctype html public "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML> <HEAD> <base href="http://www.britishcouncil.org/" /> <META http-equiv="Content-Type" Content="text/html;charset=utf-8"> <meta name="WT.sp" content="Learning;Home Page Smart View" /> <meta name="WT.cg_n" content="Learn English Gateway" /> <META NAME="DCS.dcsuri" CONTENT="/learning-english-gateway.htm">..." What is actually happening, why I couldn't return the "title"?

    Read the article

  • Python: How best to parse a simple grammar?

    - by Rosarch
    Ok, so I've asked a bunch of smaller questions about this project, but I still don't have much confidence in the designs I'm coming up with, so I'm going to ask a question on a broader scale. I am parsing pre-requisite descriptions for a course catalog. The descriptions almost always follow a certain form, which makes me think I can parse most of them. From the text, I would like to generate a graph of course pre-requisite relationships. (That part will be easy, after I have parsed the data.) Some sample inputs and outputs: "CS 2110" => ("CS", 2110) # 0 "CS 2110 and INFO 3300" => [("CS", 2110), ("INFO", 3300)] # 1 "CS 2110, INFO 3300" => [("CS", 2110), ("INFO", 3300)] # 1 "CS 2110, 3300, 3140" => [("CS", 2110), ("CS", 3300), ("CS", 3140)] # 1 "CS 2110 or INFO 3300" => [[("CS", 2110)], [("INFO", 3300)]] # 2 "MATH 2210, 2230, 2310, or 2940" => [[("MATH", 2210), ("MATH", 2230), ("MATH", 2310)], [("MATH", 2940)]] # 3 If the entire description is just a course, it is output directly. If the courses are conjoined ("and"), they are all output in the same list If the course are disjoined ("or"), they are in separate lists Here, we have both "and" and "or". One caveat that makes it easier: it appears that the nesting of "and"/"or" phrases is never greater than as shown in example 3. What is the best way to do this? I started with PLY, but I couldn't figure out how to resolve the reduce/reduce conflicts. The advantage of PLY is that it's easy to manipulate what each parse rule generates: def p_course(p): 'course : DEPT_CODE COURSE_NUMBER' p[0] = (p[1], int(p[2])) With PyParse, it's less clear how to modify the output of parseString(). I was considering building upon @Alex Martelli's idea of keeping state in an object and building up the output from that, but I'm not sure exactly how that is best done. def addCourse(self, str, location, tokens): self.result.append((tokens[0][0], tokens[0][1])) def makeCourseList(self, str, location, tokens): dept = tokens[0][0] new_tokens = [(dept, tokens[0][1])] new_tokens.extend((dept, tok) for tok in tokens[1:]) self.result.append(new_tokens) For instance, to handle "or" cases: def __init__(self): self.result = [] # ... self.statement = (course_data + Optional(OR_CONJ + course_data)).setParseAction(self.disjunctionCourses) def disjunctionCourses(self, str, location, tokens): if len(tokens) == 1: return tokens print "disjunction tokens: %s" % tokens How does disjunctionCourses() know which smaller phrases to disjoin? All it gets is tokens, but what's been parsed so far is stored in result, so how can the function tell which data in result corresponds to which elements of token? I guess I could search through the tokens, then find an element of result with the same data, but that feel convoluted... What's a better way to approach this problem?

    Read the article

< Previous Page | 155 156 157 158 159 160 161 162 163 164 165 166  | Next Page >