Search Results

Search found 13683 results on 548 pages for 'python sphinx'.

Page 363/548 | < Previous Page | 359 360 361 362 363 364 365 366 367 368 369 370  | Next Page >

  • Best way to get back to using the power of lxml after having to use a regex to find something in an

    - by PyNEwbie
    I am trying to rip some text out of a large number of html documents (numbers in the hundreds of thousands). The documents are really forms but they are prepared by a very large group of different organizations so there is significant variation in how they create the document. For example, the documents are divided into chapters. I might want to extract the contents of Chapter 5 from every document so I can analyze the content of the chapter. Initially I thought this would be easy but it turns out that the authors might use a set of non-nested tables throughout the document to hold the content so that Chapter n could be displayed using td tags inside a table. Or they might use other elements such as p tags H tags, div tags or any other block level element. After trying repeatedly to use lxml to help me identify the beginning and end of each chapter I have determined that it is a lot cleaner to use a regular expression because in every case, no matter what the enclosing html element is the chapter label is always in the form of >Chapter # It is a little more complicated in that there might be some white space or non-breaking space represented in different ways (  or   or just spaces). Nonetheless it was trivial to write a regular expression to identify the beginning of each section. (The beginning of one section is the end of the previous section.) But now I want to use lxml to get the text out. My thought is that I have really no choice but to walk along my string to find the close tag for the element that encloses the text I am using to find the relevant section. That is here is one example where the element holding the Chapter name is a div <div style="DISPLAY: block; MARGIN-LEFT: 0pt; TEXT-INDENT: 0pt; MARGIN-RIGHT: 0pt" align="left"><font style="DISPLAY: inline; FONT-WEIGHT: bold; FONT-SIZE: 10pt; FONT-FAMILY: Times New Roman">Chapter 1.&#160;&#160;&#160;Our Beginnings.</font></div> So I am imagining that I would begin at the location where I found the match for chapter 1 and set up a regular expressions to find the next </div|</td|</p|</h1 . . . So at this point I have identified the type of element holding my chapter heading I can use the same logic to find all of the text that is within that element that is set up a regular expression to help me mark from >Chapter 1.&#160;&#160;&#160;Our Beginnings.< So I have identified where my Chapter 1 begins I can do the same for chapter 2 (which is where Chapter 1 ends) Now I am imagining that I am going to snip the document beginning at the opening of the element that I identified as the element the indicates where chapter 1 begins and ending just before the opening of the element that I identified as the element that indicates where Chapter 2 begins. The string that I have identified will then be fed to lxml to use its power to get the content. I am going to all of this trouble because I have read over and over - never use a regular expression to extract content from html documents and I have not hit on a way to be as accurate with lxml to identify the starting and ending locations for the text I want to extract. For example, I can never be certain that the subtitle of Chapter 1 is Our Beginnings it could be Our Red Canary. Let me say that I spent two solid days trying with lxml to be confident that I had the beginning and ending elements and I could only be accurate <60% of the time but a very short regular expression has given me better than 95% success. I have a tendency to make things more complicated than necessary so I am wondering if anyone has seen or solved a similar problems and if they had an approach (not the details mind you) that they would like to offer.

    Read the article

  • Help converting code using httlib2 to use urllib2

    - by ThinkCode
    What am I trying to do? Visit a site, retrieve cookie, visit the next page by sending in the cookie info. It all works but httplib2 is giving me one too many problems with socks proxy on one site. http = httplib2.Http() main_url = 'http://mywebsite.com/get.aspx?id='+ id +'&rows=25' response, content = http.request(main_url, 'GET', headers=headers) main_cookie = response['set-cookie'] referer = 'http://google.com' headers = {'Content-type': 'application/x-www-form-urlencoded', 'Cookie': main_cookie, 'User-Agent' : USER_AGENT, 'Referer' : referer} How to do the same exact thing using urllib2 (cookie retrieving, passing to the next page on the same site)? Thank you.

    Read the article

  • Matplotlib: plotting discrete values

    - by Arkapravo
    I am trying to plot the following ! from numpy import * from pylab import * import random for x in range(1,500): y = random.randint(1,25000) print(x,y) plot(x,y) show() However, I keep getting a blank graph (?). Just to make sure that the program logic is correct I added the code print(x,y), just the confirm that (x,y) pairs are being generated. (x,y) pairs are being generated, but there is no plot, I keep getting a blank graph. Any help ?

    Read the article

  • Trying to set up nested while loops using a boolean switch

    - by thorn100
    First time posting. I'm trying to set up a while loop that will ask the user for the employee name, hours worked and hourly wage until the user enters 'DONE'. Eventually I'll modify the code to calculate the weekly pay and write it to a list, but one thing at a time. The problem is once the main while loop executes once, it just stops. Doesn't error out but just stops. I have to kill the program to get it to stop. I want it to ask the three questions again and again until the user is finished. Thoughts? Please note that this is just an exercise and not meant for any real world application. def getName(): """Asks for the employee's full name""" firstName=raw_input("\nWhat is your first name? ") lastName=raw_input("\nWhat is your last name? ") fullName=firstName.title() + " " + lastName.title() return fullName def getHours(): """Asks for the number of hours the employee worked""" hoursWorked=0 while int(hoursWorked)<1 or int(hoursWorked) > 60: hoursWorked=raw_input("\nHow many hours did the employee work: ") if int(hoursWorked)<1 or int(hoursWorked) > 60: print "Please enter an integer between 1 and 60." else: return hoursWorked def getWage(): """Asks for the employee's hourly wage""" wage=0 while float(wage)<6.00 or float(wage)>20.00: wage=raw_input("\nWhat is the employee's hourly wage: ") if float(wage)<6.00 or float(wage)>20.00: print ("Please enter an hourly wage between $6.00 and $20.00") else: return wage ##sentry variables employeeName="" employeeHours=0 employeeWage=0 booleanDone=False #Enter employee info print "Please enter payroll information for an employee or enter 'DONE' to quit." while booleanDone==False: while employeeName=="": employeeName=getName() if employeeName.lower()=="done": booleanDone=True break print "The employee's name is", employeeName while employeeHours==0: employeeHours=getHours() if employeeHours.lower()=="done": booleanDone=True break print employeeName, "worked", employeeHours, "this week." while employeeWage==0: employeeWage=getWage() if employeeWage.lower()=="done": booleanDone=True break print employeeName + "'s hourly wage is $" + employeeWage

    Read the article

  • Disable autocomplete on textfield in Django?

    - by tau-neutrino
    Does anyone know how you can turn off autocompletion on a textfield in Django? For example, a form that I generate from my model has an input field for a credit card number. It is bad practice to leave autocompletion on. When making the form by hand, I'd add a autocomplete="off" statement, but how do you do it in Django and still retain the form validation?

    Read the article

  • redefine __and__ operator

    - by wiso
    Why I can't redefine the __and__ operator? class Cut(object): def __init__(self, cut): self.cut = cut def __and__(self, other): return Cut("(" + self.cut + ") && (" + other.cut + ")") a = Cut("a>0") b = cut("b>0") c = a and b print c.cut() I want (a>0) && (b>0), but I got b, that the usual behaviour of and

    Read the article

  • 'NoneType' object has no attribute 'data'

    - by Bill Jordan
    Hello guys, I am sending a SOAP request to my server and getting the response back. sample of the response string is shown below: <?xml version = '1.0' ?> <env:Envelope xmlns:env=http:////www.w3.org/2003/05/soap-envelop . .. .. <env:Body> <epas:get-all-config-resp xmlns:epas="urn:organization:epas:soap"> ^M ... ... <epas:property name="Tom">12</epas:property> > > <epas:property name="Alice">34</epas:property> > > <epas:property name="John">56</epas:property> > > <epas:property name="Danial">78</epas:property> > > <epas:property name="George">90</epas:property> > > <epas:property name="Luise">11</epas:property> ... ^M </env:Body? </env:Envelop> What I noticed in the response is that there is an extra character shown in the body which is "^M". Not sure if this could be the issue. Note the ^M shown! when I tried parsing the string returned from the server to get the names and values using the code sample: elements = minidom.parseString(xmldoc).getElementsByTagName("property") myDict = {} for element in elements: myDict[element.getAttribute('name')] = element.firstChild.data But, I am getting this error: 'NoneType' object has no attribute 'data'. May be its something to do with the "^M" shown on the xml response back! Any ideas/comments would be appreciated, Cheers

    Read the article

  • GUI IDE with PyDev Eclipse

    - by gizgok
    I have 2 weeks to finish my final year project.I need a GUI IDE or a GUI framework compatible with PyDev and Eclipse. I cannot spend time learning something cause the functionality is yet to be completed.I'm looking for very simple GUI for a simulation game.

    Read the article

  • pylab.savefig() and pylab.show() image difference

    - by Jack1990
    I'm making an script to automatically create plots from .xvg files, but there's a problem when I'm trying to use pylab's savefig() method. Using pylab.show() and saving from there, everything's fine. Using pylab.show() Using pylab.savefig() def producePlot(timestep, energy_values,type_line = 'r', jump = 1,finish = 100): fc = sp.interp1d(timestep[::jump], energy_values[::jump],kind='cubic') xnew = numpy.linspace(0, finish, finish*2) pylab.plot(xnew, fc(xnew),type_line) pylab.xlabel('Time in ps ') pylab.ylabel('kJ/mol') pylab.xlim(xmin=0, xmax=finish) def produceSimplePlot(timestep, energy_values,type_line = 'r', jump = 1,finish = 100): pylab.plot(timestep, energy_values,type_line) pylab.xlabel('Time in ps ') pylab.ylabel('kJ/mol') pylab.xlim(xmin=0, xmax=finish) def linearRegression(timestep, energy_values, type_line = 'g'): #, jump = 1,finish = 100): from scipy import stats import numpy #print 'fuck' timestep = numpy.asarray(timestep) slope, intercept, r_value, p_value, std_err = stats.linregress(timestep,energy_values) line = slope*timestep+intercept pylab.plot(timestep, line, type_line) def plottingTime(Title,file_name, timestep, energy_values ,loc, jump , finish): pylab.title(Title) producePlot(timestep,energy_values, 'b',jump, finish) linearRegression(timestep,energy_values) import numpy Average = numpy.average(energy_values) #print Average pylab.legend(("Average = %.2f" %(Average),'Linear Reg'),loc) #pylab.show() pylab.savefig('%s.jpg' %file_name[:-4], bbox_inches= None, pad_inches=0) #if __name__ == '__main__': #plottingTime(Title,timestep1, energy_values, jump =10, finish = 4800) def specialCase(Title,file_name, timestep, energy_values,loc, jump, finish): #print 'Working here ...?' pylab.title(Title) producePlot(timestep,energy_values, 'b',jump, finish) import numpy from pylab import * Average = numpy.average(energy_values) #print Average pylab.legend(("Average = %.2g" %(Average), Title),loc) locs,labels = yticks() yticks(locs, map(lambda x: "%.3g" % x, locs)) #pylab.show() pylab.savefig('%s.jpg' %file_name[:-4] , bbox_inches= None, pad_inches=0) Thanks in advance, John

    Read the article

  • How to preprocess a Django model field value before return?

    - by Satoru.Logic
    Hi, all. I have a Note model class like this: class Note(models.Model): author = models.ForeignKey(User, related_name='notes') content = NoteContentField(max_length=256) NoteContentField is a custom sub-class of CharField that override the to_python method in purpose of doing some twitter-text-conversion processing. class NoteContentField(models.CharField): __metaclass__ = models.SubfieldBase def to_python(self, value): value = super(NoteContentField, self).to_python(value) from ..utils import linkify return mark_safe(linkify(value)) However, this doesn't work. When I save a Note object like this: note = Note(author=request.use, content=form.cleaned_data['content']) The conversed value is saved into the database, which is not what I wanna see. Would you please tell me what's wrong with this? Thanks in advance.

    Read the article

  • Run shell script using fabric and piping script text to shell's stdin

    - by Peter Lyons
    Is there a way to execute a multi-line shell script by piping it to the remote shell's standard input in fabric? Or must I always write it to the remote filesystem, then run it, then delete it? I like sending to stdin as it avoids the temporary file. If there's no fabric API (and it seems like there is not based on my research), presumably I can just use the ssh module directly. Basically I wish fabric.api.run was not limited to a 1-line command that gets passed to the shell as a command line argument, but instead would take a full multi-line script and write it to the remote shell's standard input.

    Read the article

  • SQLAlchemy: select over multiple tables

    - by ahojnnes
    Hi, I wanted to optimize my database query: link_list = select( columns=[link_table.c.rating, link_table.c.url, link_table.c.donations_in], whereclause=and_( not_(link_table.c.id.in_( select( columns=[request_table.c.recipient], whereclause=request_table.c.donator==donator.id ).as_scalar() )), link_table.c.id!=donator.id, ), limit=20, ).execute().fetchall() and tried to merge those two selects in one query: link_list = select( columns=[link_table.c.rating, link_table.c.url, link_table.c.donations_in], whereclause=and_( link_table.c.active==True, link_table.c.id!=donator.id, request_table.c.donator==donator.id, link_table.c.id!=request_table.c.recipient, ), limit=20, order_by=[link_table.c.rating.desc()] ).execute().fetchall() the database-schema looks like: link_table = Table('links', metadata, Column('id', Integer, primary_key=True, autoincrement=True), Column('url', Unicode(250), index=True, unique=True), Column('registration_date', DateTime), Column('donations_in', Integer), Column('active', Boolean), ) request_table = Table('requests', metadata, Column('id', Integer, primary_key=True, autoincrement=True), Column('recipient', Integer, ForeignKey('links.id')), Column('donator', Integer, ForeignKey('links.id')), Column('date', DateTime), ) There are several links (donator) in request_table pointing to one link in the link_table. I want to have links from link_table, which are not yet "requested". But this does not work. Is it actually possible, what I'm trying to do? If so, how would you do that? Thank you very much in advance!

    Read the article

  • how to format date when i load data from google-app-engine..

    - by zjm1126
    i use remote_api to load data from google-app-engine. appcfg.py download_data --config_file=helloworld/GreetingLoad.py --filename=a.csv --kind=Greeting helloworld the setting is: class AlbumExporter(bulkloader.Exporter): def __init__(self): bulkloader.Exporter.__init__(self, 'Greeting', [('author', str, None), ('content', str, None), ('date', str, None), ]) exporters = [AlbumExporter] and i download a.csv is : the date is not readable , and the date in appspot.com admin is : so how to get the full date ?? thanks i change this : class AlbumExporter(bulkloader.Exporter): def __init__(self): bulkloader.Exporter.__init__(self, 'Greeting', [('author', str, None), ('content', str, None), ('date', lambda x: datetime.datetime.strptime(x, '%m/%d/%Y').date(), None), ]) exporters = [AlbumExporter] but the error is :

    Read the article

  • Problem with Replacing special characters in a string

    - by Hossein
    Hi, I am trying to feed some text to a special pupose parser. The problem with this parser is that it is sensitive to ()[] characters and in my sentence in the text have quite a lot of these characters. The manual for the parser suggests that all the ()[] get replaced with \( \) \[ \]. So using str.replace i am using to attach \ to all of those charcaters. I use the code below: a = 'abcdef(1234)' a.replace('(','\(') however i get this as my output: 'abcdef\\(1234)' What is wrong with my code? can anyone provide me a solution to solve this for these characters?

    Read the article

  • How to draw the "trail" in a maze solving application

    - by snow-spur
    Hello i have designed a maze and i want to draw a path between the cells as the 'person' moves from one cell to the next. So each time i move the cell a line is drawn Also i am using the graphics module The graphics module is an object oriented library Im importing from graphics import* from maze import* my circle which is my cell center = Point(15, 15) c = Circle(center, 12) c.setFill('blue') c.setOutline('yellow') c.draw(win) p1 = Point(c.getCenter().getX(), c.getCenter().getY()) this is my loop if mazez.blockedCount(cloc)> 2: mazez.addDecoration(cloc, "grey") mazez[cloc].deadend = True c.move(-25, 0) p2 = Point(getX(), getY()) line = graphics.Line(p1, p2) cloc.col = cloc.col - 1 Now it says getX not defined every time i press a key is this because of p2???

    Read the article

  • uWSGI with Cherokee: first steps

    - by Steve
    Has anyone tried using uWSGI with Cherokee? Can you share your experiences and what documents you relied upon the most? I am trying to get started from the documentation on both (uWSGI and Cherokee) websites. Nothing works yet. I am using Ubuntu 10.04.

    Read the article

  • Formwizards for editing in Django

    - by Espen Christensen
    Hi, I am in the process of making a webapp, and this webapp needs to have a form wizard. The wizard consists of 3 ModelForms, and it works flawlessly. But I need the second form to be a "edit form". That is, i need it to be a form that is passed an instance. How can you do this with a form wizard? How do you pass in an instance of a model? I see that the FormWizard class has a get_form method, but isnt there a documented way to use the formwizard for editing/reviewing of data?

    Read the article

  • Combinatorial optimisation of a distance metric

    - by Jose
    I have a set of trajectories, made up of points along the trajectory, and with the coordinates associated with each point. I store these in a 3d array ( trajectory, point, param). I want to find the set of r trajectories that have the maximum accumulated distance between the possible pairwise combinations of these trajectories. My first attempt, which I think is working looks like this: max_dist = 0 for h in itertools.combinations ( xrange(num_traj), r): for (m,l) in itertools.combinations (h, 2): accum = 0. for ( i, j ) in itertools.izip ( range(k), range(k) ): A = [ (my_mat[m, i, z] - my_mat[l, j, z])**2 \ for z in xrange(k) ] A = numpy.array( numpy.sqrt (A) ).sum() accum += A if max_dist < accum: selected_trajectories = h This takes forever, as num_traj can be around 500-1000, and r can be around 5-20. k is arbitrary, but can typically be up to 50. Trying to be super-clever, I have put everything into two nested list comprehensions, making heavy use of itertools: chunk = [[ numpy.sqrt((my_mat[m, i, :] - my_mat[l, j, :])**2).sum() \ for ((m,l),i,j) in \ itertools.product ( itertools.combinations(h,2), range(k), range(k)) ]\ for h in itertools.combinations(range(num_traj), r) ] Apart from being quite unreadable (!!!), it is also taking a long time. Can anyone suggest any ways to improve on this?

    Read the article

  • Error -3 while decompressing data: incorrect header check

    - by Rahul99
    I have .zip file which contain csv data. I am reading .zip file using <input type = "file" name = "select_file"/> I want to decompress that .zip file and read csv data. file_data = self.request.get('select_file') file_str = zlib.decompress(file_data) #file_data_list = file_str.split('\n') #file_Reader = csv.reader(file_data_list,quoting=csv.QUOTE_NONE ) I am expecting csv data in file_str but I am getting error. error :: Error -3 while decompressing data: incorrect header check What I have to use?

    Read the article

< Previous Page | 359 360 361 362 363 364 365 366 367 368 369 370  | Next Page >