Search Results

Search found 13539 results on 542 pages for 'python gtkmozembed'.

Page 358/542 | < Previous Page | 354 355 356 357 358 359 360 361 362 363 364 365  | Next Page >

  • Best way to get back to using the power of lxml after having to use a regex to find something in an

    - by PyNEwbie
    I am trying to rip some text out of a large number of html documents (numbers in the hundreds of thousands). The documents are really forms but they are prepared by a very large group of different organizations so there is significant variation in how they create the document. For example, the documents are divided into chapters. I might want to extract the contents of Chapter 5 from every document so I can analyze the content of the chapter. Initially I thought this would be easy but it turns out that the authors might use a set of non-nested tables throughout the document to hold the content so that Chapter n could be displayed using td tags inside a table. Or they might use other elements such as p tags H tags, div tags or any other block level element. After trying repeatedly to use lxml to help me identify the beginning and end of each chapter I have determined that it is a lot cleaner to use a regular expression because in every case, no matter what the enclosing html element is the chapter label is always in the form of >Chapter # It is a little more complicated in that there might be some white space or non-breaking space represented in different ways (  or   or just spaces). Nonetheless it was trivial to write a regular expression to identify the beginning of each section. (The beginning of one section is the end of the previous section.) But now I want to use lxml to get the text out. My thought is that I have really no choice but to walk along my string to find the close tag for the element that encloses the text I am using to find the relevant section. That is here is one example where the element holding the Chapter name is a div <div style="DISPLAY: block; MARGIN-LEFT: 0pt; TEXT-INDENT: 0pt; MARGIN-RIGHT: 0pt" align="left"><font style="DISPLAY: inline; FONT-WEIGHT: bold; FONT-SIZE: 10pt; FONT-FAMILY: Times New Roman">Chapter 1.&#160;&#160;&#160;Our Beginnings.</font></div> So I am imagining that I would begin at the location where I found the match for chapter 1 and set up a regular expressions to find the next </div|</td|</p|</h1 . . . So at this point I have identified the type of element holding my chapter heading I can use the same logic to find all of the text that is within that element that is set up a regular expression to help me mark from >Chapter 1.&#160;&#160;&#160;Our Beginnings.< So I have identified where my Chapter 1 begins I can do the same for chapter 2 (which is where Chapter 1 ends) Now I am imagining that I am going to snip the document beginning at the opening of the element that I identified as the element the indicates where chapter 1 begins and ending just before the opening of the element that I identified as the element that indicates where Chapter 2 begins. The string that I have identified will then be fed to lxml to use its power to get the content. I am going to all of this trouble because I have read over and over - never use a regular expression to extract content from html documents and I have not hit on a way to be as accurate with lxml to identify the starting and ending locations for the text I want to extract. For example, I can never be certain that the subtitle of Chapter 1 is Our Beginnings it could be Our Red Canary. Let me say that I spent two solid days trying with lxml to be confident that I had the beginning and ending elements and I could only be accurate <60% of the time but a very short regular expression has given me better than 95% success. I have a tendency to make things more complicated than necessary so I am wondering if anyone has seen or solved a similar problems and if they had an approach (not the details mind you) that they would like to offer.

    Read the article

  • downloading archives response corrupts files

    - by panchicore
    wrapper = FileWrapper(file("C:/pics.zip")) content_type = mimetypes.guess_type(result.files)[0] response = HttpResponse(wrapper, content_type=content_type) response['Content-Length'] = os.path.getsize("C:/pics.zip") response['Content-Disposition'] = "attachment; filename=pics.zip" return response pics.zip is a valid file with 3 pictures inside. server response the download, but when I am going to open the zip, winrar says This archive is either in unknown format or damaged! If I change the file path and the file name to a valid image C:/pic.jpg is downloaded damaged too. What Im missing in this download view?

    Read the article

  • Regex for Matching First Alphanumeric Character skipping (The |An? )

    - by TheLizardKing
    I have a list of artists, albums and tracks that I want to sort using the first letter of their respective name. The issue arrives when I want to ignore "The ", "A ", "An " and other various non-alphanumeric characters (Talking to you "Weird Al" Yankovic and [dialog]). Django has a nice start '^(An?|The) +' but I want to ignore those and a few others of my choice. I am doing this in Django, using a MySQL db with utf8_bin collation.

    Read the article

  • Editting ForeignKey from "child" table

    - by profuel
    I'm programming on py with django. I have models: class Product(mymodels.Base): title = models.CharField() price = models.ForeignKey(Price) promoPrice = models.ForeignKey(Price, related_name="promo_price") class Price(mymodels.Base): value = models.DecimalField(max_digits=10, decimal_places=3) taxValue = models.DecimalField("Tax Value", max_digits=10, decimal_places=3) valueWithTax = models.DecimalField("Value with Tax", max_digits=10, decimal_places=3) I want to see INPUTs for both prices when editing product, but cannot find any possibility to do that. inlines = [...] works only from Price to Product, which is stupid in this case. Thanx for adnvance.

    Read the article

  • Sending data from one Protocol to another Protocol in Twisted?

    - by veb
    Hi! One of my protocols is connected to a server, and with the output of that I'd like to send it to the other protocol. I need to access the 'msg' method in ClassA from ClassB but I keep getting: exceptions.AttributeError: 'NoneType' object has no attribute 'write' Actual code: http://pastebin.com/MQPhduSY Any ideas please? :-)

    Read the article

  • A web framework where AJAX was not an after thought

    - by Pirate for Profit
    AJAX is a pain in the ass because it essentially means you'll have to write two sets of similarish code: one for browsers with JavaScript enabled and those without. Not only this, but you have to connect JavaScript events to hook into your models and display the results. And if all that weren't bad enough, you need to send an address change with the request, otherwise the user won't be able to "click back" correctly (if confused look at what happens to the address bar when you click links in GMail). We're searching for something that had the foresight and design goals with all these concerns in mind. Performance and security are also obvious major concerns. We love config-based systems as well, where you don't have to write a lot of code you just drop it into an easily read config format. It's like asking for the holy grail right?

    Read the article

  • Dynamic Spacer in ReportLab

    - by ptikobj
    I'm automatically generating a PDF-file with Platypus that has dynamic content. This means that it might happen that the length of the text content (which is directly at the bottom of the pdf-file) may vary. However, it might happen that a page break is done in cases where the content is too long. This is because i use a "static" spacer: s = Spacer(width=0, height=23.5*cm) as i always want to have only one page, I somehow need to dynamically set the height of the Spacer, so that it takes the "rest" of the space that is on the page as its height. Now, how do i get the "rest" of height that is left on my page?

    Read the article

  • Geocoding non-addresses: Geopy

    - by Phil Donovan
    Using geopy to geocode alcohol outlets in NZ. The problem I have is that some places do not have street addresses but are places in Google Maps. For example, plugging: Furneaux Lodge, Endeavour Inlet, Queen Charlotte Sound, Marlborough 7250 into Google Maps via the browser GUI gives me However, using that in Geopy I get a GQueryError saying this geographic location does not exist. Here is the code for geocoding: def GeoCode(address): g=geocoders.Google(domain="maps.google.co.nz") geoloc = g.geocode(address, exactly_one=False) place, (lat, lng) = geoloc[0] GeoOut = [] GeoOut.extend([place, lat, lng]) return GeoOut GeoCode("Furneaux Lodge, Endeavour Inlet, Queen Charlotte Sound, Marlboroguh 7250") Meanwhile, I notice that "Eiffel Tower" works fine. Is there away to solve this and can someone explain the difference between The Eiffel Tower and Furneaux Lodge within Google 'locations'?

    Read the article

  • Simple XML over http web service

    - by Mark
    I have a simple html service, developed in django. You enter your name - it posts this, and returns a value (male/female). I need to ofer this as a web service. I have no idea where to start. I want to accept a xml request, and provide an xml response - thats it. Can anyone give ma any pointers - Googling it is difficult when you dont know what your searching for.

    Read the article

  • Help converting code using httlib2 to use urllib2

    - by ThinkCode
    What am I trying to do? Visit a site, retrieve cookie, visit the next page by sending in the cookie info. It all works but httplib2 is giving me one too many problems with socks proxy on one site. http = httplib2.Http() main_url = 'http://mywebsite.com/get.aspx?id='+ id +'&rows=25' response, content = http.request(main_url, 'GET', headers=headers) main_cookie = response['set-cookie'] referer = 'http://google.com' headers = {'Content-type': 'application/x-www-form-urlencoded', 'Cookie': main_cookie, 'User-Agent' : USER_AGENT, 'Referer' : referer} How to do the same exact thing using urllib2 (cookie retrieving, passing to the next page on the same site)? Thank you.

    Read the article

  • pyplot: really slow creating heatmaps

    - by cvondrick
    I have a loop that executes the body about 200 times. In each loop iteration, it does a sophisticated calculation, and then as debugging, I wish to produce a heatmap of a NxM matrix. But, generating this heatmap is unbearably slow and significantly slow downs an already slow algorithm. My code is along the lines: import numpy import matplotlib.pyplot as plt for i in range(200): matrix = complex_calculation() plt.set_cmap("gray") plt.imshow(matrix) plt.savefig("frame{0}.png".format(i)) The matrix, from numpy, is not huge --- 300 x 600 of doubles. Even if I do not save the figure and instead update an on-screen plot, it's even slower. Surely I must be abusing pyplot. (Matlab can do this, no problem.) How do I speed this up?

    Read the article

  • SQLAlchemy: select over multiple tables

    - by ahojnnes
    Hi, I wanted to optimize my database query: link_list = select( columns=[link_table.c.rating, link_table.c.url, link_table.c.donations_in], whereclause=and_( not_(link_table.c.id.in_( select( columns=[request_table.c.recipient], whereclause=request_table.c.donator==donator.id ).as_scalar() )), link_table.c.id!=donator.id, ), limit=20, ).execute().fetchall() and tried to merge those two selects in one query: link_list = select( columns=[link_table.c.rating, link_table.c.url, link_table.c.donations_in], whereclause=and_( link_table.c.active==True, link_table.c.id!=donator.id, request_table.c.donator==donator.id, link_table.c.id!=request_table.c.recipient, ), limit=20, order_by=[link_table.c.rating.desc()] ).execute().fetchall() the database-schema looks like: link_table = Table('links', metadata, Column('id', Integer, primary_key=True, autoincrement=True), Column('url', Unicode(250), index=True, unique=True), Column('registration_date', DateTime), Column('donations_in', Integer), Column('active', Boolean), ) request_table = Table('requests', metadata, Column('id', Integer, primary_key=True, autoincrement=True), Column('recipient', Integer, ForeignKey('links.id')), Column('donator', Integer, ForeignKey('links.id')), Column('date', DateTime), ) There are several links (donator) in request_table pointing to one link in the link_table. I want to have links from link_table, which are not yet "requested". But this does not work. Is it actually possible, what I'm trying to do? If so, how would you do that? Thank you very much in advance!

    Read the article

  • wxPython: Change a buttons text in a wx.FileDialog

    - by Sascha
    Hello I have a wx.FileDialog (with the wx.FD_OPEN flag) & I would like to know if I can (& how) I could change the button in the bottom right of the FileDialog from "Open" to "Create" or "Delete", etc. What I am doing is I have a button with the text "Delete Portfolio", when pressed it opens a FileDialog & allows the user to select a portfolio file(.db) to delete. So instead of the File Dialog's bottom right confirm button displaying "Open" I would like to be able to change the text to "Confirm" or "Delete" or whatever. Is this possible, its a rather superficial thing to do, but if the button says open when the user wants to select a file to delete, it can be a little confusing even if the title of the dialog says "please select a file to delete"

    Read the article

  • How to set up Atana Studio 3 Themes in Pydev

    - by willy1234x1
    I've installed the Aptana Studio 3 preview and noticed it has support for themes (such as a bespin style or Ruby envy) and I'd love to use the Bespin one in Pydev but so far I've had no luck getting it to work, anyone have a clue as to how to get it to work? Video showing the themes in action.

    Read the article

  • How do I call setattr() on the current module?

    - by Matt Joiner
    What do I pass as the first parameter "object" to the function setattr(object, name, value), to set variables on the current module? For example: setattr(object, "SOME_CONSTANT", 42); giving the same effect as: SOME_CONSTANT = 42 within the module containing these lines (with the correct object). I'm generate several values at the module level dynamically, and as I can't define __getattr__ at the module level, this is my fallback.

    Read the article

  • How to turn a list of tuples into a string?

    - by matt
    I have a list of tuples that I'm trying to incorporate into a SQL query but I can't figure out how to join them together without adding slashes. My like this: list = [('val', 'val'), ('val', 'val'), ('val', 'val')] If I turn each tuple into a string and try to join them with a a comma I'll get something like ' (\'val\, \'val\'), ... ' What's the right way to do this, so I can get the list (without brackets) as a string?

    Read the article

  • pylab.savefig() and pylab.show() image difference

    - by Jack1990
    I'm making an script to automatically create plots from .xvg files, but there's a problem when I'm trying to use pylab's savefig() method. Using pylab.show() and saving from there, everything's fine. Using pylab.show() Using pylab.savefig() def producePlot(timestep, energy_values,type_line = 'r', jump = 1,finish = 100): fc = sp.interp1d(timestep[::jump], energy_values[::jump],kind='cubic') xnew = numpy.linspace(0, finish, finish*2) pylab.plot(xnew, fc(xnew),type_line) pylab.xlabel('Time in ps ') pylab.ylabel('kJ/mol') pylab.xlim(xmin=0, xmax=finish) def produceSimplePlot(timestep, energy_values,type_line = 'r', jump = 1,finish = 100): pylab.plot(timestep, energy_values,type_line) pylab.xlabel('Time in ps ') pylab.ylabel('kJ/mol') pylab.xlim(xmin=0, xmax=finish) def linearRegression(timestep, energy_values, type_line = 'g'): #, jump = 1,finish = 100): from scipy import stats import numpy #print 'fuck' timestep = numpy.asarray(timestep) slope, intercept, r_value, p_value, std_err = stats.linregress(timestep,energy_values) line = slope*timestep+intercept pylab.plot(timestep, line, type_line) def plottingTime(Title,file_name, timestep, energy_values ,loc, jump , finish): pylab.title(Title) producePlot(timestep,energy_values, 'b',jump, finish) linearRegression(timestep,energy_values) import numpy Average = numpy.average(energy_values) #print Average pylab.legend(("Average = %.2f" %(Average),'Linear Reg'),loc) #pylab.show() pylab.savefig('%s.jpg' %file_name[:-4], bbox_inches= None, pad_inches=0) #if __name__ == '__main__': #plottingTime(Title,timestep1, energy_values, jump =10, finish = 4800) def specialCase(Title,file_name, timestep, energy_values,loc, jump, finish): #print 'Working here ...?' pylab.title(Title) producePlot(timestep,energy_values, 'b',jump, finish) import numpy from pylab import * Average = numpy.average(energy_values) #print Average pylab.legend(("Average = %.2g" %(Average), Title),loc) locs,labels = yticks() yticks(locs, map(lambda x: "%.3g" % x, locs)) #pylab.show() pylab.savefig('%s.jpg' %file_name[:-4] , bbox_inches= None, pad_inches=0) Thanks in advance, John

    Read the article

  • Error -3 while decompressing data: incorrect header check

    - by Rahul99
    I have .zip file which contain csv data. I am reading .zip file using <input type = "file" name = "select_file"/> I want to decompress that .zip file and read csv data. file_data = self.request.get('select_file') file_str = zlib.decompress(file_data) #file_data_list = file_str.split('\n') #file_Reader = csv.reader(file_data_list,quoting=csv.QUOTE_NONE ) I am expecting csv data in file_str but I am getting error. error :: Error -3 while decompressing data: incorrect header check What I have to use?

    Read the article

  • Sort a list of tuples without case sensitivity

    - by dound
    How can I efficiently and easily sort a list of tuples without being sensitive to case? For example this: [('a', 'c'), ('A', 'b'), ('a', 'a'), ('a', 5)] Should look like this once sorted: [('a', 5), ('a', 'a'), ('A', 'b'), ('a', 'c')] The regular lexicographic sort will put 'A' before 'a' and yield this: [('A', 'b'), ('a', 5), ('a', 'a'), ('a', 'c')]

    Read the article

  • GUI IDE with PyDev Eclipse

    - by gizgok
    I have 2 weeks to finish my final year project.I need a GUI IDE or a GUI framework compatible with PyDev and Eclipse. I cannot spend time learning something cause the functionality is yet to be completed.I'm looking for very simple GUI for a simulation game.

    Read the article

  • Remove padding in wxPython's wxWizard

    - by mridang
    Hi Guys, I'm using wxPython to create a wizard using the wxWizard control. I'm trying to a draw a colored rectangle but when I run the app, there seems to be a about a 10px padding on each side of the rectangle. This goes for all other controls too. I have to offset them a bit so that they appear exactly where I want them to. Is there any way I could remove this padding? Here's the source of my base Wizard page. class SimplePage(wx.wizard.PyWizardPage): """ Simple wizard page with unlimited rows of text. """ def __init__(self, parent, title): wx.wizard.PyWizardPage.__init__(self, parent) self.next = self.prev = None #self.sizer = wx.BoxSizer(wx.VERTICAL) title = wx.StaticText(self, -1, title) title.SetFont(wx.Font(18, wx.SWISS, wx.NORMAL, wx.BOLD)) #self.sizer.AddWindow(title, 0, wx.ALIGN_LEFT|wx.ALL, padding) #self.sizer.AddWindow(wx.StaticLine(self, -1), 0, wx.EXPAND|wx.ALL, padding) # self.SetSizer(self.sizer) self.Bind(wx.EVT_PAINT, self.OnPaint) def OnPaint(self, evt): """set up the device context (DC) for painting""" self.dc = wx.PaintDC(self) self.dc.BeginDrawing() self.dc.SetPen(wx.Pen("grey",style=wx.TRANSPARENT)) self.dc.SetBrush(wx.Brush("grey", wx.SOLID)) # set x, y, w, h for rectangle self.dc.DrawRectangle(0,0,500, 500) self.dc.EndDrawing() del self.dc def SetNext(self, next): self.next = next def SetPrev(self, prev): self.prev = prev def GetNext(self): return self.next def GetPrev(self): return self.prev def Activated(self, evt): """ Executed when page is being activated. """ return def Blocked(self, evt): """ Executed when page is about to be switched. Switching can be blocked by returning True. """ return False def Cancel(self, evt): """ Executed when wizard is about to be canceled. Canceling can be blocked by returning False. """ return True Thanks guys.

    Read the article

< Previous Page | 354 355 356 357 358 359 360 361 362 363 364 365  | Next Page >