Search Results

Search found 13534 results on 542 pages for 'python 3 4'.

Page 378/542 | < Previous Page | 374 375 376 377 378 379 380 381 382 383 384 385  | Next Page >

  • Detect if 2 HTML fragments have identical hierarchical structure

    - by sergzach
    An example of fragments that have identical hierarchical structure: (1) <div> <span>It's a message</span> </div> (2) <div> <span class='bold'>This is a new text</span> </div> An example of fragments that have different structure: (1) <div> <span><b>It's a message</b></span> </div> (2) <div> <span>This is a new text</span> </div> So, fragments with a similar structure correspond to one hierarchical tree (the same tag names, the same hierarchical structure). How can I detect if 2 elements (html fragments) have the same structure simply with lxml? I have a function that does not work properly for some more difficult case (than the example): def _is_equal( el1, el2 ): # input: 2 elements with possible equal structure and tag names # e.g. root = lxml.html.fromstring( buf ) # el1 = root[ 0 ] # el2 = root[ 1 ] # move from top to bottom, compare elements result = False if el1.tag == el2.tag: # has no children if len( el1 ) == len( el2 ): if len( el1 ) == 0: return True else: # iterate one of them, for example el1 i = 0 for child1 in el1: child2 = el2[ i ] is_equal2 = _is_equal( child1, child2 ) if not is_equal2: return False return True else: return False else: return False The code fails to detect that 2 divs with class='tovar2' have an identical structure: <body> <div class="tovar2"> <h2 class="new"> <a href="http://modnyedeti-krsk.ru/magazin/product/333193003"> ?????? ?/? </a> </h2> <ul class="art"> <li> ???????: <span>1759</span> </li> </ul> <div> <div class="wrap" style="width:180px;"> <div class="new"> <img src="shop_files/new-t.png" alt=""> </div> <a class="highslide" href="http://modnyedeti-krsk.ru/d/459730/d/820.jpg" onclick="return hs.expand(this)"> <img src="shop_files/fr_5.gif" style="background:url(/d/459730/d/548470803_5.jpg) 50% 50% no-repeat scroll;" alt="?????? ?/?" height="160" width="180"> </a> </div> </div> <form action="" onsubmit="return addProductForm(17094601,333193003,3150.00,this,false);"> <ul class="bott "> <li class="price">????:<br> <span> <b> 3 150 </b> ???. </span> </li> <li class="amount">???-??:<br><input class="number" onclick="this.select()" value="1" name="product_amount" type="text"> </li> <li class="buy"><input value="" type="submit"> </li> </ul> </form> </div> <div class="tovar2"> <h2 class="new"> <a href="http://modnyedeti-krsk.ru/magazin/product/333124803">?????? ?/?</a> </h2> <ul class="art"> <li> ???????: <span>1759</span> </li> </ul> <div> <div class="wrap" style="width:180px;"> <div class="new"> <img src="shop_files/new-t.png" alt=""> </div> <a class="highslide" href="http://modnyedeti-krsk.ru/d/459730/d/820.jpg" onclick="return hs.expand(this)"> <img src="shop_files/fr_5.gif" style="background:url(/d/459730/d/548470803_5.jpg) 50% 50% no-repeat scroll;" alt="?????? ?/?" height="160" width="180"> </a> </div> </div> <form action="" onsubmit="return addProductForm(17094601,333124803,3150.00,this,false);"> <ul class="bott "> <li class="price">????:<br> <span> <b>3 150</b> ???. </span> </li> <li class="amount">???-??:<br><input class="number" onclick="this.select()" value="1" name="product_amount" type="text"> </li> <li class="buy"> <input value="" type="submit"> </li> </ul> </form> </div> </body>

    Read the article

  • Extend argparse to write set names in the help text for optional argument choices and define those sets once at the end

    - by Kent
    Example of the problem If I have a list of valid option strings which is shared between several arguments, the list is written in multiple places in the help string. Making it harder to read: def main(): elements = ['a', 'b', 'c', 'd', 'e', 'f'] parser = argparse.ArgumentParser() parser.add_argument( '-i', nargs='*', choices=elements, default=elements, help='Space separated list of case sensitive element names.') parser.add_argument( '-e', nargs='*', choices=elements, default=[], help='Space separated list of case sensitive element names to ' 'exclude from processing') parser.parse_args() When running the above function with the command line argument --help it shows: usage: arguments.py [-h] [-i [{a,b,c,d,e,f} [{a,b,c,d,e,f} ...]]] [-e [{a,b,c,d,e,f} [{a,b,c,d,e,f} ...]]] optional arguments: -h, --help show this help message and exit -i [{a,b,c,d,e,f} [{a,b,c,d,e,f} ...]] Space separated list of case sensitive element names. -e [{a,b,c,d,e,f} [{a,b,c,d,e,f} ...]] Space separated list of case sensitive element names to exclude from processing What would be nice It would be nice if one could define an option list name, and in the help output write the option list name in multiple places and define it last of all. In theory it would work like this: def main_optionlist(): elements = ['a', 'b', 'c', 'd', 'e', 'f'] # Two instances of OptionList are equal if and only if they # have the same name (ALFA in this case) ol = OptionList('ALFA', elements) parser = argparse.ArgumentParser() parser.add_argument( '-i', nargs='*', choices=ol, default=ol, help='Space separated list of case sensitive element names.') parser.add_argument( '-e', nargs='*', choices=ol, default=[], help='Space separated list of case sensitive element names to ' 'exclude from processing') parser.parse_args() And when running the above function with the command line argument --help it would show something similar to: usage: arguments.py [-h] [-i [ALFA [ALFA ...]]] [-e [ALFA [ALFA ...]]] optional arguments: -h, --help show this help message and exit -i [ALFA [ALFA ...]] Space separated list of case sensitive element names. -e [ALFA [ALFA ...]] Space separated list of case sensitive element names to exclude from processing sets in optional arguments: ALFA {a,b,c,d,e,f} Question I need to: Replace the {'l', 'i', 's', 't', 's'} shown with the option name, in the optional arguments. At the end of the help text show a section explaining which elements each option name consists of. So I ask: Is this possible using argparse? Which classes would I have to inherit from and which methods would I need to override? I have tried looking at the source for argparse, but as this modification feels pretty advanced I don´t know how to get going.

    Read the article

  • Pass errors in Django using HttpResponseRedirect

    - by JPC
    I know that HttpResponseRedirect only takes one parameter, a URL. But there are cases when I want to redirect with an error message to display. I was reading this post: How to pass information using an http redirect (in Django) and there were a lot of good suggestions. I don't really want to use a library that I don't know how works. I don't want to rely on messages which, according to the Django docs, is going to be removed. I thought about using sessions. I also like the idea of passing it in a URL, something like: return HttpResponseRedirect('/someurl/?error=1') and then having some map from error code to message. Is it good practice to have a global map-like structure which hard codes in these error messages or is there a better way? Or should I just use a session EDIT: I got it working using a session. Is that a good practice to put things like this in the session?

    Read the article

  • Optimization of Function with Dictionary and Zip()

    - by eWizardII
    Hello, I have the following function: def filetxt(): word_freq = {} lvl1 = [] lvl2 = [] total_t = 0 users = 0 text = [] for l in range(0,500): # Open File if os.path.exists("C:/Twitter/json/user_" + str(l) + ".json") == True: with open("C:/Twitter/json/user_" + str(l) + ".json", "r") as f: text_f = json.load(f) users = users + 1 for i in range(len(text_f)): text.append(text_f[str(i)]['text']) total_t = total_t + 1 else: pass # Filter occ = 0 import string for i in range(len(text)): s = text[i] # Sample string a = re.findall(r'(RT)',s) b = re.findall(r'(@)',s) occ = len(a) + len(b) + occ s = s.encode('utf-8') out = s.translate(string.maketrans("",""), string.punctuation) # Create Wordlist/Dictionary word_list = text[i].lower().split(None) for word in word_list: word_freq[word] = word_freq.get(word, 0) + 1 keys = word_freq.keys() numbo = range(1,len(keys)+1) WList = ', '.join(keys) NList = str(numbo).strip('[]') WList = WList.split(", ") NList = NList.split(", ") W2N = dict(zip(WList, NList)) for k in range (0,len(word_list)): word_list[k] = W2N[word_list[k]] for i in range (0,len(word_list)-1): lvl1.append(word_list[i]) lvl2.append(word_list[i+1]) I have used the profiler to find that it seems the greatest CPU time is spent on the zip() function and the join and split parts of the code, I'm looking to see if there is any way I have overlooked that I could potentially clean up the code to make it more optimized, since the greatest lag seems to be in how I am working with the dictionaries and the zip() function. Any help would be appreciated thanks!

    Read the article

  • Why does SQLAlchemy with psycopg2 use_native_unicode have poor performance?

    - by Bob Dover
    I'm having a difficult time figuring out why a simple SELECT query is taking such a long time with sqlalchemy using raw SQL (I'm getting 14600 rows/sec, but when running the same query through psycopg2 without sqlalchemy, I'm getting 38421 rows/sec). After some poking around, I realized that toggling sqlalchemy's use_native_unicode parameter in the create_engine call actually makes a huge difference. This query takes 0.5secs to retrieve 7300 rows: from sqlalchemy import create_engine engine = create_engine("postgresql+psycopg2://localhost...", use_native_unicode=True) r = engine.execute("SELECT * FROM logtable") fetched_results = r.fetchall() This query takes 0.19secs to retrieve the same 7300 rows: engine = create_engine("postgresql+psycopg2://localhost...", use_native_unicode=False) r = engine.execute("SELECT * FROM logtable") fetched_results = r.fetchall() The only difference between the 2 queries is use_native_unicode. But sqlalchemy's own docs state that it is better to keep use_native_unicode=True (http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html). Does anyone know why use_native_unicode is making such a big performance difference? And what are the ramifications of turning off use_native_unicode?

    Read the article

  • Declare models elsewhere than in "models.py"

    - by sebpiq
    Hi ! I have an application that splits models into different files. Actually the folder looks like : >myapp __init__.py models.py >hooks ... ... myapp don't care about what's in the hooks, folder, except that there are models, and that they have to be declared somehow. So, I put this in myapp.__init__.py : from django.conf import settings for hook in settings.HOOKS : try : __import__(hook) except ImportError as e : print "Got import err !", e #where HOOKS = ("myapp.hooks.a_super_hook1", ...) The problem is that it doesn't work when I run syncdb(and throws some strange "Got import err !"... strange considering that it's related to another module of my program that I don't even import anywhere :/ ) ! So I tried successively : 1) for hook in settings.HOOKS : try : exec ("from %s import *" % hook) doesn't work either : syncdb doesn't install the models in hooks 2) from myapp.hooks.a_super_hook1 import * This works 3) exec("from myapp.hooks.a_super_hook1 import *") This works to So I checked that in the test 1), the statement executed is the same than in tests 2) and 3), and it is exactly the same ... Any idea ???

    Read the article

  • Will this SQL screw up

    - by Joshua
    I'm sure everyone knows the joys of concurrency when it comes to threading. Imagine the following scenario on every page-load on a noobily set up MySQL db: UPDATE stats SET visits = (visits+1) If a thousand users load the page at same time, will the count screw up? is this that table locking/row locking crap? Which one mysql use.

    Read the article

  • How to get the related_name of a many-to-many-field?

    - by amann
    I am trying to get the related_name of a many-to-many-field. The m2m-field is located betweeen the models "Group" and "Lection" and is declared in the group-model as following: lections = models.ManyToManyField(Lection, blank=True) The field looks like this: <django.db.models.fields.related.ManyToManyField object at 0x012AD690> The print of field.__dict__ is: {'_choices': [], '_m2m_column_cache': 'group_id', '_m2m_name_cache': 'group', '_m2m_reverse_column_cache': 'lection_id', '_m2m_reverse_name_cache': 'lection', '_unique': False, 'attname': 'lections', 'auto_created': False, 'blank': True, 'column': 'lections', 'creation_counter': 71, 'db_column': None, 'db_index': False, 'db_table': None, 'db_tablespace': '', 'default': <class django.db.models.fields.NOT_PROVIDED at 0x00FC8780>, 'editable': True, 'error_messages': {'blank': <django.utils.functional.__proxy__ object at 0x00FC 7B50>, 'invalid_choice': <django.utils.functional.__proxy__ object at 0x00FC7A50>, 'null': <django.utils.functional.__proxy__ object at 0x00FC7 A70>}, 'help_text': <django.utils.functional.__proxy__ object at 0x012AD6F0>, 'm2m_column_name': <function _curried at 0x012A88F0>, 'm2m_db_table': <function _curried at 0x012A8AF0>, 'm2m_field_name': <function _curried at 0x012A8970>, 'm2m_reverse_field_name': <function _curried at 0x012A89B0>, 'm2m_reverse_name': <function _curried at 0x012A8930>, 'max_length': None, 'name': 'lections', 'null': False, 'primary_key': False, 'rel': <django.db.models.fields.related.ManyToManyRel object at 0x012AD6B0>, 'related': <RelatedObject: mymodel:group related to lections>, 'related_query_name': <function _curried at 0x012A8670>, 'serialize': True, 'unique_for_date': None, 'unique_for_month': None, 'unique_for_year': None, 'validators': [], 'verbose_name': 'lections'} Now the field should be accessed via a lection-instance. So this is done by lection.group_set But i need to access it dynamically, so there is the need to get the related_name attribute from somewhere. Here in the documentation, there is a note that it is possible to access ManyToManyField.related_name, but this doesn't work for my somehow.. Help would be a lot appreciated. Thanks in advance.

    Read the article

  • Writing csv header removes data from numpy array written below

    - by user338095
    I'm trying to export data to a csv file. It should contain a header (from datastack) and restacked arrays with my data (from datastack). One line in datastack has the same length as dataset. The code below works but it removes parts of the first line from datastack. Any ideas why that could be? s = ','.join(itertools.chain(dataset)) + '\n' newfile = 'export.csv' f = open(newfile,'w') f.write(s) numpy.savetxt(newfile, (numpy.transpose(datastack)), delimiter=', ') f.close()

    Read the article

  • Emptying the datastore in GAE

    - by colwilson
    I know what you're thinking, 'O not that again!', but here we are since Google have not yet provided a simpler method. I have been using a queue based solution which worked fine: import datetime from models import * DELETABLE_MODELS = [Alpha, Beta, AlphaBeta] def initiate_purge(): for e in config.DELETABLE_MODELS: deferred.defer(delete_entities, e, 'purging', _queue = 'purging') class NotEmptyException(Exception): pass def delete_entities(e, queue): try: q = e.all(keys_only=True) db.delete(q.fetch(200)) ct = q.count(1) if ct > 0: raise NotEmptyException('there are still entities to be deleted') else: logging.info('processing %s completed' % queue) except Exception, err: deferred.defer(delete_entities, e, then, queue, _queue = queue) logging.info('processing %s deferred: %s' % (queue, err)) All this does is queue a request to delete some data (once for each class) and then if the queued process either fails or knows there is still some stuff to delete, it re-queues itself. This beats the heck out of hitting the refresh on a browser for 10 minutes. However, I'm having trouble deleting AlphaBeta entities, there are always a few left at the end. I think because it contains Reference Properties: class AlphaBeta(db.Model): alpha = db.ReferenceProperty(Alpha, required=True, collection_name='betas') beta = db.ReferenceProperty(Beta, required=True, collection_name='alphas') I have tried deleting the indexes relating to these entity types, but that did not make any difference. Any advice would be appreciated please.

    Read the article

  • Import Error: No module named testrunner

    - by JiL
    I followed this to add zc.recipe.testrunner to my buildout. I can run buildout successfully but when I run bin/test, I get: ImportError: No module named testrunner I have zope.testrunner-4.0.4-py2.4.egg in /usr/local/lib/python2.4/site-packages I also pinned zope.testrunner = 4.0.4 zc.recipe.testruner = 1.4.0 zc.recipe.egg = 1.3.2 When I ran buildout, I used -vvv and I got: ... Installing 'zc.recipe.testrunner'. We have the distribution that satisfies 'zc.recipe.testrunner==1.4.0'. Egg from site-packages: z3c.recipe.scripts 1.0.1 Egg from site-packages: zope.testrunner 4.0.4 Egg from site-packages: zope.interface 3.8.0 Egg from site-packages: zope.exceptions 3.7.1 ... We have the distribution that satisfies 'zope.testrunner==4.0.4'. Egg from site-packages: zope.testrunner 4.0.4 Adding required 'zope.interface' required by zope.testrunner 4.0.4. We have a develop egg: zope.interface 0.0 Adding required 'zope.exceptions' required by zope.testrunner 4.0.4. We have a develop egg: zope.exceptions 0.0 ... Why is it I get an ImportError? Is zope.testrunner not installed correctly?

    Read the article

  • Django dictionary in templates: Grab key from another objects attribute

    - by Jordan Messina
    I have a dictionary called number_devices I'm passing to a template, the dictionary keys are the ids of a list of objects I'm also passing to the template (called implementations). I'm iterating over the list of objects and then trying to use the object.id to get a value out of the dict like so: {% for implementation in implementations %} {{ number_devices.implementation.id }} {% endfor %} Unfortunately number_devices.implementation is evaluated first, then the result.id is evaluated obviously returning and displaying nothing. I can't use parentheses like: {{ number_devices.(implementation.id) }} because I get a parse error. How do I get around this annoyance in Django templates? Thanks for any help!

    Read the article

  • How can I merge two lists and sort them working in 'linear' time?

    - by Sergio Tapia
    I have this, and it works: # E. Given two lists sorted in increasing order, create and return a merged # list of all the elements in sorted order. You may modify the passed in lists. # Ideally, the solution should work in "linear" time, making a single # pass of both lists. def linear_merge(list1, list2): finalList = [] for item in list1: finalList.append(item) for item in list2: finalList.append(item) finalList.sort() return finalList # +++your code here+++ return But, I'd really like to learn this stuff well. :) What does 'linear' time mean?

    Read the article

  • Simple numpy question

    - by dassouki
    I can't get this snippet to work: #base code A = array([ [ 1, 2, 10 ], [ 1, 3, 20 ], [ 1, 4, 30 ], [ 2, 1, 15 ], [ 2, 3, 25 ], [ 2, 4, 35 ], [ 3, 1, 17 ], [ 3, 2, 27 ], [ 3, 4, 37 ], [ 4, 1, 13 ], [ 4, 2, 23 ], [ 4, 3, 33 ] ]) # Number of zones zones = unique1d(A[:,0]) for origin in zones: for destination in zones: if origin != destination: A_ik = A[(A[:,0] == origin & A[:,1] == destination), 2]

    Read the article

  • Fixed strptime exception with thread lock, but slows down the program

    - by eWizardII
    I have the following code, which when is running inside of a thread (the full code is here - https://github.com/eWizardII/homobabel/blob/master/lovebird.py) for null in range(0,1): while True: try: with open('C:/Twitter/tweets/user_0_' + str(self.id) + '.json', mode='w') as f: f.write('[') threadLock.acquire() for i, seed in enumerate(Cursor(api.user_timeline,screen_name=self.ip).items(200)): if i>0: f.write(", ") f.write("%s" % (json.dumps(dict(sc=seed.author.statuses_count)))) j = j + 1 threadLock.release() f.write("]") except tweepy.TweepError, e: with open('C:/Twitter/tweets/user_0_' + str(self.id) + '.json', mode='a') as f: f.write("]") print "ERROR on " + str(self.ip) + " Reason: ", e with open('C:/Twitter/errors_0.txt', mode='a') as a_file: new_ii = "ERROR on " + str(self.ip) + " Reason: " + str(e) + "\n" a_file.write(new_ii) break Now without the thread lock I generate the following error: Exception in thread Thread-117: Traceback (most recent call last): File "C:\Python27\lib\threading.py", line 530, in __bootstrap_inner self.run() File "C:/Twitter/homobabel/lovebird.py", line 62, in run for i, seed in enumerate(Cursor(api.user_timeline,screen_name=self.ip).items(200)): File "build\bdist.win-amd64\egg\tweepy\cursor.py", line 110, in next self.current_page = self.page_iterator.next() File "build\bdist.win-amd64\egg\tweepy\cursor.py", line 85, in next items = self.method(page=self.current_page, *self.args, **self.kargs) File "build\bdist.win-amd64\egg\tweepy\binder.py", line 196, in _call return method.execute() File "build\bdist.win-amd64\egg\tweepy\binder.py", line 182, in execute result = self.api.parser.parse(self, resp.read()) File "build\bdist.win-amd64\egg\tweepy\parsers.py", line 75, in parse result = model.parse_list(method.api, json) File "build\bdist.win-amd64\egg\tweepy\models.py", line 38, in parse_list results.append(cls.parse(api, obj)) File "build\bdist.win-amd64\egg\tweepy\models.py", line 49, in parse user = User.parse(api, v) File "build\bdist.win-amd64\egg\tweepy\models.py", line 86, in parse setattr(user, k, parse_datetime(v)) File "build\bdist.win-amd64\egg\tweepy\utils.py", line 17, in parse_datetime date = datetime(*(time.strptime(string, '%a %b %d %H:%M:%S +0000 %Y')[0:6])) File "C:\Python27\lib\_strptime.py", line 454, in _strptime_time return _strptime(data_string, format)[0] File "C:\Python27\lib\_strptime.py", line 300, in _strptime _TimeRE_cache = TimeRE() File "C:\Python27\lib\_strptime.py", line 188, in __init__ self.locale_time = LocaleTime() File "C:\Python27\lib\_strptime.py", line 77, in __init__ raise ValueError("locale changed during initialization") ValueError: locale changed during initialization The problem is with thread lock on, each thread runs itself serially basically, and it takes way to long for each loop to run for there to be any advantage to having a thread anymore. So if there isn't a way to get rid of the thread lock, is there a way to have it run the for loop inside of the try statement faster?

    Read the article

  • JQuery cookie access has stopped working for GAE app

    - by Greg
    I have a google app engine app that has been running for some time, and some javascript code that checks for a login cookie has suddenly stopped working. As far as I can tell, NO code has changed. The relevant code uses the jquery cookies plugin (jquery.cookies.2.2.0.min.js)... // control the default screen depending // if someone is logged in if( $.cookies.get('dev_appserver_login') != null || $.cookies.get('ACSID') != null ) { alert("valid cookie!") $("#inventory-container").show(); } else { alert("INvalid cookie!") $("#welcome-container").show(); } The reason for the two checks is that in the GAE SDK, the cookies are named differently. The production system uses 'ACSID'. This if statement works in the SDK and now fails 100% of the time in production. I have verified that the cookie is, in fact, present when I inspect the page. Thoughts?

    Read the article

  • pyPDF - Retrieve page numbers from document

    - by SquidneyPoitier
    At the moment I'm looking into doing some PDF merging with pyPdf, but sometimes the inputs are not in the right order, so I'm looking into scraping each page for its page number to determine the order it should go in (e.g. if someone split up a book into 20 10-page PDFs and I want to put them back together). I have two questions - 1.) I know that sometimes the page number is stored in the document data somewhere, as I've seen PDFs that render on Adobe as something like [1243] (10 of 150), but I've read documents of this sort into pyPDF and I can't find any information indicating the page number - where is this stored? 2.) If avenue #1 isn't available, I think I could iterate through the objects on a given page to try to find a page number - likely it would be its own object that has a single number in it. However, I can't seem to find any clear way to determine the contents of objects. If I run: pdf.getPage(0).getContents() This usually either returns: {'/Filter': '/FlateDecode'} or it returns a list of IndirectObject(num, num) objects. I don't really know what to do with either of these and there's no real documentation on it as far as I can tell. Is anyone familiar with this kind of thing that could point me in the right direction?

    Read the article

< Previous Page | 374 375 376 377 378 379 380 381 382 383 384 385  | Next Page >