Search Results

Search found 13596 results on 544 pages for 'mechanize python'.

Page 159/544 | < Previous Page | 155 156 157 158 159 160 161 162 163 164 165 166  | Next Page >

  • Efficient file buffering & scanning methods for large files in python

    - by eblume
    The description of the problem I am having is a bit complicated, and I will err on the side of providing more complete information. For the impatient, here is the briefest way I can summarize it: What is the fastest (least execution time) way to split a text file in to ALL (overlapping) substrings of size N (bound N, eg 36) while throwing out newline characters. I am writing a module which parses files in the FASTA ascii-based genome format. These files comprise what is known as the 'hg18' human reference genome, which you can download from the UCSC genome browser (go slugs!) if you like. As you will notice, the genome files are composed of chr[1..22].fa and chr[XY].fa, as well as a set of other small files which are not used in this module. Several modules already exist for parsing FASTA files, such as BioPython's SeqIO. (Sorry, I'd post a link, but I don't have the points to do so yet.) Unfortunately, every module I've been able to find doesn't do the specific operation I am trying to do. My module needs to split the genome data ('CAGTACGTCAGACTATACGGAGCTA' could be a line, for instance) in to every single overlapping N-length substring. Let me give an example using a very small file (the actual chromosome files are between 355 and 20 million characters long) and N=8 import cStringIO example_file = cStringIO.StringIO("""\ header CAGTcag TFgcACF """) for read in parse(example_file): ... print read ... CAGTCAGTF AGTCAGTFG GTCAGTFGC TCAGTFGCA CAGTFGCAC AGTFGCACF The function that I found had the absolute best performance from the methods I could think of is this: def parse(file): size = 8 # of course in my code this is a function argument file.readline() # skip past the header buffer = '' for line in file: buffer += line.rstrip().upper() while len(buffer) = size: yield buffer[:size] buffer = buffer[1:] This works, but unfortunately it still takes about 1.5 hours (see note below) to parse the human genome this way. Perhaps this is the very best I am going to see with this method (a complete code refactor might be in order, but I'd like to avoid it as this approach has some very specific advantages in other areas of the code), but I thought I would turn this over to the community. Thanks! Note, this time includes a lot of extra calculation, such as computing the opposing strand read and doing hashtable lookups on a hash of approximately 5G in size. Post-answer conclusion: It turns out that using fileobj.read() and then manipulating the resulting string (string.replace(), etc.) took relatively little time and memory compared to the remainder of the program, and so I used that approach. Thanks everyone!

    Read the article

  • python logparse search specific text

    - by krisdigitx
    hi, I am using this function in my code to return the strings i want from reading the log file, I want to grep the "exim" process and return the results, but running the code gives no error, but the output is limited to three lines, how can i just get the output only related to exim process.. #output: {'date': '13', 'process': 'syslogd', 'time': '06:27:33', 'month': 'May'} {'date': '13', 'process': 'exim[23168]:', 'time': '06:27:33', 'month': 'May'} {'May': ['syslogd']} #function: def generate_log_report(logfile): report_dict = {} for line in logfile: line_dict = dictify_logline(line) print line_dict try: month = line_dict['month'] date = line_dict['date'] time = line_dict['time'] #process = line_dict['process'] if "exim" in line_dict['process']: process = line_dict['process'] break else: process = line_dict['process'] except ValueError: continue report_dict.setdefault(month, []).append(process) return report_dict

    Read the article

  • Restart logging to a new file (Python)

    - by compie
    I'm using the following code to initialize logging in my application. logger = logging.getLogger() logger.setLevel(logging.DEBUG) # log to a file directory = '/reserved/DYPE/logfiles' now = datetime.now().strftime("%Y%m%d_%H%M%S") filename = os.path.join(directory, 'dype_%s.log' % now) file_handler = logging.FileHandler(filename) file_handler.setLevel(logging.DEBUG) formatter = logging.Formatter("%(asctime)s %(filename)s, %(lineno)d, %(funcName)s: %(message)s") file_handler.setFormatter(formatter) logger.addHandler(file_handler) # log to the console console_handler = logging.StreamHandler() level = logging.INFO console_handler.setLevel(level) logger.addHandler(console_handler) logging.debug('logging initialized') How can I close the current logging file and restart logging to a new file? Note: I don't want to use RotatingFileHandler, because I want full control over all the filenames and the moment of rotation.

    Read the article

  • Python recursion with list returns None

    - by newman
    def foo(a): a.append(1) if len(a) > 10: print a return a else: foo(a) Why this recursive function returns None (see transcript below)? I can't quite understand what I am doing wrong. In [263]: x = [] In [264]: y = foo(x) [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] In [265]: print y None

    Read the article

  • Replacing python docstrings

    - by tomaz
    I have written a epytext to reST markup converter, and now I want to convert all the docstrings in my entire library from epytext to reST format. Is there a smart way to read the all the docstrings in a module and write back the replacements? ps: ast module perhaps?

    Read the article

  • Python and classes

    - by Artyom
    Hello, i have 2 classes. How i call first.TQ in Second ? Without creating object First in Second. class First: def __init__(self): self.str = "" def TQ(self): pass def main(self): T = Second(self.str) # Called here class Second(): def __init__(self): list = {u"RANDINT":first.TQ} # List of funcs maybe called in first ..... ..... return data

    Read the article

  • Python string formatting too slow

    - by wich
    I use the following code to log a map, it is fast when it only contains zeroes, but as soon as there is actual data in the map it becomes unbearably slow... Is there any way to do this faster? log_file = open('testfile', 'w') for i, x in ((i, start + i * interval) for i in range(length)): log_file.write('%-5d %8.3f %13g %13g %13g %13g %13g %13g\n' % (i, x, map[0][i], map[1][i], map[2][i], map[3][i], map[4][i], map[5][i]))

    Read the article

  • Find last match with python regular expression

    - by SDD
    I wanto to match the last occurence of a simple pattern in a string, e.g. list = re.findall(r"\w+ AAAA \w+", "foo bar AAAA foo2 AAAA bar2) print "last match: ", list[len(list)-1] however, if the string is very long, a huge list of matches is generated. Is there a more direct way to match the second occurence of "AAAA" or should I use this workaround?

    Read the article

  • Regular expressions in python unicode

    - by Remy
    I need to remove all the html tags from a given webpage data. I tried this using regular expressions: import urllib2 import re page = urllib2.urlopen("http://www.frugalrules.com") from bs4 import BeautifulSoup, NavigableString, Comment soup = BeautifulSoup(page) link = soup.find('link', type='application/rss+xml') print link['href'] rss = urllib2.urlopen(link['href']).read() souprss = BeautifulSoup(rss) description_tag = souprss.find_all('description') content_tag = souprss.find_all('content:encoded') print re.sub('<[^>]*>', '', content_tag) But the syntax of the re.sub is: re.sub(pattern, repl, string, count=0) So, I modified the code as (instead of the print statement above): for row in content_tag: print re.sub(ur"<[^>]*>",'',row,re.UNICODE But it gives the following error: Traceback (most recent call last): File "C:\beautifulsoup4-4.3.2\collocation.py", line 20, in <module> print re.sub(ur"<[^>]*>",'',row,re.UNICODE) File "C:\Python27\lib\re.py", line 151, in sub return _compile(pattern, flags).sub(repl, string, count) TypeError: expected string or buffer What am I doing wrong?

    Read the article

  • (Python) Converting a dictionary to a list?

    - by Daria Egelhoff
    So I have this dictionary: ScoreDict = {"Blue": {'R1': 89, 'R2': 80}, "Brown": {'R1': 61, 'R2': 77}, "Purple": {'R1': 60, 'R2': 98}, "Green": {'R1': 74, 'R2': 91}, "Red": {'R1': 87, 'Lon': 74}} Is there any way how I can convert this dictionary into a list like this: ScoreList = [['Blue', 89, 80], ['Brown', 61, 77], ['Purple', 60, 98], ['Green', 74, 91], ['Red', 87, 74]] I'm not too familiar with dictionaries, so I really need some help here. Thanks in advance!

    Read the article

  • Python unicode search not giving correct answer

    - by user1318912
    I am trying to search hindi words contained one line per file in file-1 and find them in lines in file-2. I have to print the line numbers with the number of words found. This is the code: import codecs hypernyms = codecs.open("hindi_hypernym.txt", "r", "utf-8").readlines() words = codecs.open("hypernyms_en2hi.txt", "r", "utf-8").readlines() count_arr = [] for counter, line in enumerate(hypernyms): count_arr.append(0) for word in words: if line.find(word) >=0: count_arr[counter] +=1 for iterator, count in enumerate(count_arr): if count>0: print iterator, ' ', count This is finding some words, but ignoring some others The input files are: File-1: ???? ??????? File-2: ???????, ????-???? ?????-???, ?????-???, ?????_???, ?????_??? ????_????, ????-????, ???????_???? ????-???? This gives output: 0 1 3 1 Clearly, it is ignoring ??????? and searching for ???? only. I have tried with other inputs as well. It only searches for one word. Any idea how to correct this?

    Read the article

  • Get the last '/' or '\\' character in Python

    - by wowus
    If I have a string that looks like either ./A/B/c.d OR .\A\B\c.d How do I get just the "./A/B/" part? The direction of the slashes can be the same as they are passed. This problem kinda boils down to: How do I get the last of a specific character in a string? Basically, I want the path of a file without the file part of it.

    Read the article

  • Python: Taking an array and break it into subarrays based on some criteria

    - by randombits
    I have an array of files. I'd like to be able to break that array down into one array with multiple subarrays, each subarray contains files that were created on the same day. So right now if the array contains files from March 1 - March 31, I'd like to have an array with 31 subarrays (assuming there is at least 1 file for each day). In the long run, I'm trying to find the file from each day with the latest creation/modification time. If there is a way to bundle that into the iterations that are required above to save some CPU cycles, that would be even more ideal. Then I'd have one flat array with 31 files, one for each day, for the latest file created on each individual day.

    Read the article

  • Custom keys for Google App Engine models (Python)

    - by Cameron
    First off, I'm relatively new to Google App Engine, so I'm probably doing something silly. Say I've got a model Foo: class Foo(db.Model): name = db.StringProperty() I want to use name as a unique key for every Foo object. How is this done? When I want to get a specific Foo object, I currently query the datastore for all Foo objects with the target unique name, but queries are slow (plus it's a pain to ensure that name is unique when each new Foo is created). There's got to be a better way to do this! Thanks.

    Read the article

  • Dynamic Operator Overloading on dict classes in Python

    - by Ishpeck
    I have a class that dynamically overloads basic arithmetic operators like so... import operator class IshyNum: def __init__(self, n): self.num=n self.buildArith() def arithmetic(self, other, o): return o(self.num, other) def buildArith(self): map(lambda o: setattr(self, "__%s__"%o,lambda f: self.arithmetic(f, getattr(operator, o))), ["add", "sub", "mul", "div"]) if __name__=="__main__": number=IshyNum(5) print number+5 print number/2 print number*3 print number-3 But if I change the class to inherit from the dictionary (class IshyNum(dict):) it doesn't work. I need to explicitly def __add__(self, other) or whatever in order for this to work. Why?

    Read the article

  • Rectangle Rotation in Python/Pygame

    - by mramazingguy
    Hey I'm trying to rotate a rectangle around its center and when I try to rotate the rectangle, it moves up and to the left at the same time. Does anyone have any ideas on how to fix this? def rotatePoint(self, angle, point, origin): sinT = sin(radians(angle)) cosT = cos(radians(angle)) return (origin[0] + (cosT * (point[0] - origin[0]) - sinT * (point[1] - origin[1])), origin[1] + (sinT * (point[0] - origin[0]) + cosT * (point[1] - origin[1]))) def rotateRect(self, degrees): center = (self.collideRect.centerx, self.collideRect.centery) self.collideRect.topleft = self.rotatePoint(degrees, self.collideRect.topleft, center) self.collideRect.topright = self.rotatePoint(degrees, self.collideRect.topright, center) self.collideRect.bottomleft = self.rotatePoint(degrees, self.collideRect.bottomleft, center) self.collideRect.bottomright = self.rotatePoint(degrees, self.collideRect.bottomright, center)

    Read the article

  • Dynamic variable name in python

    - by PhilGo20
    I'd like to call a query with a field name filter that I wont know before run time... Not sure how to construct the variable name ...Or maybe I am tired. field_name = funct() locations = Locations.objects.filter(field_name__lte=arg1) where if funct() returns name would equal to locations = Locations.objects.filter(name__lte=arg1) Not sure how to do that ...

    Read the article

  • Python: Unpack arbitary length bits for database storage

    - by sberry2A
    I have a binary data format consisting of 18,000+ packed int64s, ints, shorts, bytes and chars. The data is packed to minimize it's size, so they don't always use byte sized chunks. For example, a number whose min and max value are 31, 32 respectively might be stored with a single bit where the actual value is bitvalue + min, so 0 is 31 and 1 is 32. I am looking for the most efficient way to unpack all of these for subsequent processing and database storage. Right now I am able to read any value by using either struct.unpack, or BitBuffer. I use struct.unpack for any data that starts on a bit where (bit-offset % 8 == 0 and data-length % 8 == 0) and I use BitBuffer for anything else. I know the offset and size of every packed piece of data, so what is going to be the fasted way to completely unpack them? Many thanks.

    Read the article

  • Using __str__ representation for printing objects in containers in Python

    - by BobDobbs
    I've noticed that when an instance with an overloaded str method is passed to the print() function as an argument, it prints as intended. However, when passing a container that contains one of those instances to print(), it uses the repr method instead. That is to say, print(x) displays the correct string representation of x, and print(x, y) works correctly, but print([x]) or print((x, y)) prints the repr representation instead. First off, why does this happen? Secondly, is there a way to correct that behavior of print() in this circumstance?

    Read the article

  • python - from matrix to dictionary in single line

    - by Sanich
    matrix is a list of lists. I've to return a dictionary of the form {i:(l1[i],l2[i],...,lm[i])} Where the key i is matched with a tuple the i'th elements from each list. Say matrix=[[1,2,3,4],[9,8,7,6],[4,8,2,6]] so the line: >>> dict([(i,tuple(matrix[k][i] for k in xrange(len(matrix)))) for i in xrange(len(matrix[0]))]) does the job pretty well and outputs: {0: (1, 9, 4), 1: (2, 8, 8), 2: (3, 7, 2), 3: (4, 6, 6)} but fails if the matrix is empty: matrix=[]. The output should be: {} How can i deal with this?

    Read the article

  • Python - Compress Ascii String

    - by n0idea
    I'm looking for a way to compress an ascii-based string, any help? I need also need to decompress it. I tried zlib but with no help. What can I do to compress the string into lesser length? code: def compress(request): if request.POST: data = request.POST.get('input') if is_ascii(data): result = zlib.compress(data) return render_to_response('index.html', {'result': result, 'input':data}, context_instance = RequestContext(request)) else: result = "Error, the string is not ascii-based" return render_to_response('index.html', {'result':result}, context_instance = RequestContext(request)) else: return render_to_response('index.html', {}, context_instance = RequestContext(request))

    Read the article

  • improve my python program to fetch the desire rows by using if condition

    - by user2560507
    unique.txt file contains: 2 columns with columns separated by tab. total.txt file contains: 3 columns each column separated by tab. I take each row from unique.txt file and find that in total.txt file. If present then extract entire row from total.txt and save it in new output file. ###Total.txt column a column b column c interaction1 mitochondria_205000_225000 mitochondria_195000_215000 interaction2 mitochondria_345000_365000 mitochondria_335000_355000 interaction3 mitochondria_345000_365000 mitochondria_5000_25000 interaction4 chloroplast_115000_128207 chloroplast_35000_55000 interaction5 chloroplast_115000_128207 chloroplast_15000_35000 interaction15 2_10515000_10535000 2_10505000_10525000 ###Unique.txt column a column b mitochondria_205000_225000 mitochondria_195000_215000 mitochondria_345000_365000 mitochondria_335000_355000 mitochondria_345000_365000 mitochondria_5000_25000 chloroplast_115000_128207 chloroplast_35000_55000 chloroplast_115000_128207 chloroplast_15000_35000 mitochondria_185000_205000 mitochondria_25000_45000 2_16595000_16615000 2_16585000_16605000 4_2785000_2805000 4_2775000_2795000 4_11395000_11415000 4_11385000_11405000 4_2875000_2895000 4_2865000_2885000 4_13745000_13765000 4_13735000_13755000 My program: file=open('total.txt') file2 = open('unique.txt') all_content=file.readlines() all_content2=file2.readlines() store_id_lines = [] ff = open('match.dat', 'w') for i in range(len(all_content)): line=all_content[i].split('\t') seq=line[1]+'\t'+line[2] for j in range(len(all_content2)): if all_content2[j]==seq: ff.write(seq) break Problem: but istide of giving desire output (values of those 1st column that fulfile the if condition). i nead somthing like if jth of unique.txt == ith of total.txt then write ith row of total.txt into new file.

    Read the article

< Previous Page | 155 156 157 158 159 160 161 162 163 164 165 166  | Next Page >