Search Results

Search found 13539 results on 542 pages for 'python gtkmozembed'.

Page 357/542 | < Previous Page | 353 354 355 356 357 358 359 360 361 362 363 364  | Next Page >

  • Best way to get back to using the power of lxml after having to use a regex to find something in an

    - by PyNEwbie
    I am trying to rip some text out of a large number of html documents (numbers in the hundreds of thousands). The documents are really forms but they are prepared by a very large group of different organizations so there is significant variation in how they create the document. For example, the documents are divided into chapters. I might want to extract the contents of Chapter 5 from every document so I can analyze the content of the chapter. Initially I thought this would be easy but it turns out that the authors might use a set of non-nested tables throughout the document to hold the content so that Chapter n could be displayed using td tags inside a table. Or they might use other elements such as p tags H tags, div tags or any other block level element. After trying repeatedly to use lxml to help me identify the beginning and end of each chapter I have determined that it is a lot cleaner to use a regular expression because in every case, no matter what the enclosing html element is the chapter label is always in the form of >Chapter # It is a little more complicated in that there might be some white space or non-breaking space represented in different ways (  or   or just spaces). Nonetheless it was trivial to write a regular expression to identify the beginning of each section. (The beginning of one section is the end of the previous section.) But now I want to use lxml to get the text out. My thought is that I have really no choice but to walk along my string to find the close tag for the element that encloses the text I am using to find the relevant section. That is here is one example where the element holding the Chapter name is a div <div style="DISPLAY: block; MARGIN-LEFT: 0pt; TEXT-INDENT: 0pt; MARGIN-RIGHT: 0pt" align="left"><font style="DISPLAY: inline; FONT-WEIGHT: bold; FONT-SIZE: 10pt; FONT-FAMILY: Times New Roman">Chapter 1.&#160;&#160;&#160;Our Beginnings.</font></div> So I am imagining that I would begin at the location where I found the match for chapter 1 and set up a regular expressions to find the next </div|</td|</p|</h1 . . . So at this point I have identified the type of element holding my chapter heading I can use the same logic to find all of the text that is within that element that is set up a regular expression to help me mark from >Chapter 1.&#160;&#160;&#160;Our Beginnings.< So I have identified where my Chapter 1 begins I can do the same for chapter 2 (which is where Chapter 1 ends) Now I am imagining that I am going to snip the document beginning at the opening of the element that I identified as the element the indicates where chapter 1 begins and ending just before the opening of the element that I identified as the element that indicates where Chapter 2 begins. The string that I have identified will then be fed to lxml to use its power to get the content. I am going to all of this trouble because I have read over and over - never use a regular expression to extract content from html documents and I have not hit on a way to be as accurate with lxml to identify the starting and ending locations for the text I want to extract. For example, I can never be certain that the subtitle of Chapter 1 is Our Beginnings it could be Our Red Canary. Let me say that I spent two solid days trying with lxml to be confident that I had the beginning and ending elements and I could only be accurate <60% of the time but a very short regular expression has given me better than 95% success. I have a tendency to make things more complicated than necessary so I am wondering if anyone has seen or solved a similar problems and if they had an approach (not the details mind you) that they would like to offer.

    Read the article

  • module "random" not found when building .exe from IronPython 2.6 script

    - by Graham
    I am using SharpDevelop to build an executable from my IronPython script. The only hitch is that my script has the line import random which works fine when I run the script through ipy.exe, but when I attempt to build and run an exe from the script in SharpDevelop, I always get the message: IronPython.Runtime.Exceptions.ImportException: No module named random Why isn't SharpDevelop 'seeing' random? How can I make it see it?

    Read the article

  • how to detect an escape sequence in a string

    - by mix
    Given a string named line whose raw version has this value: \rRAWSTRING how can I detect if it has the escape character \r? What I've tried is: if repr(line).startswith('\r'): blah... but it doesn't catch it. I also tried find, such as: if repr(line).find('\r') != -1: blah doesn't work either. What am I missing? thx! EDIT: thanks for all the replies and the corrections re terminolgy and sorry for the confusion. OK, if i do this print repr(line) then what it prints is: '\rSET ENABLE ACK\n' (including the single quotes). i have tried all the suggestions, including: line.startswith(r'\r') line.startswith('\\r') each of which returns False. also tried: line.find(r'\r') line.find('\\r') each of which returns -1

    Read the article

  • How to get progress bar to time Class exectution

    - by chrissygormley
    Hello, I am trying to use progress bar to show the progress of a script. I want it increase progress after every function in a class is executed. The code I have tried is below: import progressbar from time import sleep class hello(): def no(self): print 'hello!' def yes(self): print 'No!!!!!!' def pro(): bar = progressbar.ProgressBar(widgets=[progressbar.Bar('=', '[', ']'), ' ', progressbar.Percentage()]) for i in Yep(): bar.update(Yep.i()) sleep(0.1) bar.finish() if __name__ == "__main__": Yep = hello() pro() Does anyone know how to get this working. Thanks

    Read the article

  • min() and max() give error: TypeError: 'float' object is not iterable

    - by PythonUser3.3
    markList=[] Lmark=0 Hmark=0 while True: mark=float(input("Enter your marks here(Click -1 to exit)")) if mark == -1: break markList.append(mark) markList.sort() mid = len(markList)//2 if len(markList)%2==0: median=(markList[mid]+ markList[mid-1])/2 print("Median:", median) else: print("Median:" , markList[mid]) Lmark==(min(mark)) print("The lowest mark is", Lmark) Hmark==(max(mark)) print("The highest mark is", Hmark) My program is a basic grade calculator using lists. My program asks the user to input their grades into a list in which it then calculates your average and finds your lowest and highest mark. I have found the average but I can't seem to figure out how to find the lowest and highest grade. Can you please show me pr tell me what to do?

    Read the article

  • Facebook Connect via Javascript doesn't close and doesn't pass session id

    - by ensnare
    I'm trying to authenticate users via Facebook Connect using a custom Javascript button: <form> <input type="button" value="Connect with Facebook" onclick="window.open('http://www.facebook.com/login.php?api_key=XXXXX&extern=1&fbconnect=1&req_perms=publish_stream,email&return_session=0&v=1.0&next=http%3A%2F%2Fwww.example.com%2Fxd_receiver.htm&fb_connect=1&cancel_url=http%3A%2F%2Fwww.example.com%2Fregister%2Fcancel', '_blank', 'top=442,width=480,height=460,resizable=yes', true)" onlogin='window.location="/register/step2"' /> </form> I am able to authenticate users. However after authentication, the popup window just stays open and the main window is not directed anywhere. In fact, it is the popup window that goes to "/register/step2" How can I get the login window to close as expected, and to pass the facebook session id to /register/step2? Thanks!

    Read the article

  • Why do I get rows of zeros in my 2D fft?

    - by Nicholas Pringle
    I am trying to replicate the results from a paper. "Two-dimensional Fourier Transform (2D-FT) in space and time along sections of constant latitude (east-west) and longitude (north-south) were used to characterize the spectrum of the simulated flux variability south of 40degS." - Lenton et al(2006) The figures published show "the log of the variance of the 2D-FT". I have tried to create an array consisting of the seasonal cycle of similar data as well as the noise. I have defined the noise as the original array minus the signal array. Here is the code that I used to plot the 2D-FT of the signal array averaged in latitude: import numpy as np from numpy import ma from matplotlib import pyplot as plt from Scientific.IO.NetCDF import NetCDFFile ### input directory indir = '/home/nicholas/data/' ### get the flux data which is in ### [time(5day ave for 10 years),latitude,longitude] nc = NetCDFFile(indir + 'CFLX_2000_2009.nc','r') cflux_southern_ocean = nc.variables['Cflx'][:,10:50,:] cflux_southern_ocean = ma.masked_values(cflux_southern_ocean,1e+20) # mask land nc.close() cflux = cflux_southern_ocean*1e08 # change units of data from mmol/m^2/s ### create an array that consists of the seasonal signal fro each pixel year_stack = np.split(cflux, 10, axis=0) year_stack = np.array(year_stack) signal_array = np.tile(np.mean(year_stack, axis=0), (10, 1, 1)) signal_array = ma.masked_where(signal_array > 1e20, signal_array) # need to mask ### average the array over latitude(or longitude) signal_time_lon = ma.mean(signal_array, axis=1) ### do a 2D Fourier Transform of the time/space image ft = np.fft.fft2(signal_time_lon) mgft = np.abs(ft) ps = mgft**2 log_ps = np.log(mgft) log_mgft= np.log(mgft) Every second row of the ft consists completely of zeros. Why is this? Would it be acceptable to add a randomly small number to the signal to avoid this. signal_time_lon = signal_time_lon + np.random.randint(0,9,size=(730, 182))*1e-05 EDIT: Adding images and clarify meaning The output of rfft2 still appears to be a complex array. Using fftshift shifts the edges of the image to the centre; I still have a power spectrum regardless. I expect that the reason that I get rows of zeros is that I have re-created the timeseries for each pixel. The ft[0, 0] pixel contains the mean of the signal. So the ft[1, 0] corresponds to a sinusoid with one cycle over the entire signal in the rows of the starting image. Here are is the starting image using following code: plt.pcolormesh(signal_time_lon); plt.colorbar(); plt.axis('tight') Here is result using following code: ft = np.fft.rfft2(signal_time_lon) mgft = np.abs(ft) ps = mgft**2 log_ps = np.log1p(mgft) plt.pcolormesh(log_ps); plt.colorbar(); plt.axis('tight') It may not be clear in the image but it is only every second row that contains completely zeros. Every tenth pixel (log_ps[10, 0]) is a high value. The other pixels (log_ps[2, 0], log_ps[4, 0] etc) have very low values.

    Read the article

  • How to draw the "trail" in a maze solving application

    - by snow-spur
    Hello i have designed a maze and i want to draw a path between the cells as the 'person' moves from one cell to the next. So each time i move the cell a line is drawn Also i am using the graphics module The graphics module is an object oriented library Im importing from graphics import* from maze import* my circle which is my cell center = Point(15, 15) c = Circle(center, 12) c.setFill('blue') c.setOutline('yellow') c.draw(win) p1 = Point(c.getCenter().getX(), c.getCenter().getY()) this is my loop if mazez.blockedCount(cloc)> 2: mazez.addDecoration(cloc, "grey") mazez[cloc].deadend = True c.move(-25, 0) p2 = Point(getX(), getY()) line = graphics.Line(p1, p2) cloc.col = cloc.col - 1 Now it says getX not defined every time i press a key is this because of p2???

    Read the article

  • App Engine HTTP 500s

    - by pocoa
    This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application. I've handled all the situations, also DeadlineExceededError too. But sometimes I see these error messages in error logs. That request took about 10k ms, so it's not exceeded the limit too. But there is no other specific message about this error. All I know is that it returned HTTP 500. Is there anyone know the reason of these error messages? Thank you.

    Read the article

  • Event Handling in Chaco

    - by awegawef
    When hovering over a data point in Chaco, I would like a small text box to appear, with the text I desire. Also, when I click on a data point (or close enough), I would like my program to take a certain action. I have seen relevant parts of the Chaco documentation, but implementing them has proved difficult. Any help would be appreciated. Thanks.

    Read the article

  • In Django, how to create tables from an SQL file when syncdb is run

    - by Sidney
    Hi, How do I make syncdb execute SQL queries (for table creation) defined by me, rather then generating tables automatically. I'm looking for this solution as some particular models in my app represent SQL-table-views for a legacy-database table. So, I've created their SQL-views in my django-DB like this: CREATE VIEW legacy_series AS SELECT * FROM legacy.series; I have a reverse engineered model that represents the above view/legacytable. But whenever I run syncdb, I have to create all the views first by running sql scripts, otherwise syncdb simply creates tables for them (if a view is not found). How do I make syncdb run the above mentioned SQL?

    Read the article

  • Intersection of two querysets in django

    - by unagimiyagi
    Hello, I can't do an AND on two querysets. As in, q1 & q2. I get the empty set and I do not know why. I have tested this with the simplest cases. I am using django 1.1.1 I have basically objects like this: item1 name="Joe" color = "blue" item2 name="Jim" color = "blue" color = "white" item3 name="John" color = "red" color = "white" Is there something weird about having a many-to-many relationship or what am I missing? queryset1 = Item.objects.filter(color="blue") this gives (item1, item2) queryset2 = Item.objects.filter(color="white") this gives (item2, item3) queryset1 & queryset2 gives me the empty set [] The OR operator works fine (I'm using "|" ) Why is this so?

    Read the article

  • Copy **kwargs to self?

    - by Mark
    Given class ValidationRule: def __init__(self, **kwargs): # code here Is there a way that I can define __init__ such that if I were to initialize the class with something like ValidationRule(other='email') then self.other would be "added" to class without having to explicitly name every possible kwarg?

    Read the article

  • Shaders with pygtkglext

    - by qba
    Do someone know how to get glsl shaders work in gtk-opengl window? With glut all glCreateProgram etc. functions works, but when I tried to put the same gl code into pygtkglext window, its complaining about NullReference: OpenGL.error.NullFunctionError: Attempt to call an undefined function glCreateProgram, check for bool(glCreateProgram) before calling So then I from OpenGL.GL.ARB.shader_objects import *, but the result is similar: OpenGL.error.NullFunctionError: Attempt to call an undefined function glCreateProgramObjectARB, check for bool(glCreateProgramObjectARB) before calling Any idea will be useful.

    Read the article

  • The dictionary need to add every word in SpellingMistakes and the line number but it only adds the l

    - by Will Boomsight
    modules import sys import string Importing and reading the files form the Command Prompt Document = open(sys.argv[1],"r") Document = open('Wc.txt', 'r') Document = Document.read().lower() Dictionary = open(sys.argv[2],"r") Dictionary = open('Dict.txt', 'r') Dictionary = Dictionary.read() def Format(Infile): for ch in string.punctuation: Infile = Infile.replace(ch, "") for no in string.digits: Infile = Infile.replace(no, " ") Infile = Infile.lower() return(Infile) def Corrections(Infile, DictWords): Misspelled = set([]) Infile = Infile.split() DictWords = DictWords.splitlines() for word in Infile: if word not in DictWords: Misspelled.add(word) Misspelled = sorted(Misspelled) return (Misspelled) def Linecheck(Infile,ErrorWords): Infile = Infile.split() lineno = 0 Noset = list() for line in Infile: lineno += 1 line = line.split() for word in line: if word == ErrorWords: Noset.append(lineno) sorted(Noset) return(Noset) def addkey(error,linenum): Nodict = {} for line in linenum: Nodict.setdefault(error,[]).append(linenum) return Nodict FormatDoc = Format(Document) SpellingMistakes = Corrections(FormatDoc,Dictionary) alp = str(SpellingMistakes) for word in SpellingMistakes: nSet = str(Linecheck(FormatDoc,word)) nSet = nSet.split() linelist = addkey(word, nSet) print(linelist) # # for word in Nodict.keys(): # Nodict[word].append(line) Prints each incorrect word on a new line

    Read the article

  • django newbie question : cant start a new project

    - by Moayyad Yaghi
    hello . I'm totally new to django . and I'm using its documentation to get help on how to use it but seems like something is missing. i installed django using setup.py install command and i added the ( django/bin ) to system path variable but. i still cant start a new project i use the following syntax to start a project : django-admin.py startproject myNewProject but it says Type 'django-admin.py help' for usage. 1 do i miss anything ? thank u

    Read the article

  • Chipmunk warning still present with --release

    - by Kaliber64
    I'm using Python27 on Windows 7 64-bit. I downloaded the source for Chipmunk 6.2.x and compiled Pymunk with --release and -c ming32. Almost zero problems. Lots of path not found cause I'm bad. All prints seem to have disappeared but I get spammed with EPA iteration warnings. I've seen a couple discussions but no solutions. Possible chipmunk betas to fix the float errors causing the double truths causing the warning. I picked the latest stable version I think. My program is seriously bogged down with all the prints. class NullDevice(): def write(self, s): pass sys.stdout=NullDevice() Does not disable the C prints .< Any help?

    Read the article

  • How to clone a mercurial repository over an ssh connection initiated by fabric when http authorizati

    - by Monika Sulik
    I'm attempting to use fabric for the first time and I really like it so far, but at a certain point in my deployment script I want to clone a mercurial repository. When I get to that point I get an error: err: abort: http authorization required My repository requires http authorization and fabric doesn't prompt me for the user and password. I can get around this by changing my repository address from: https://hostname/repository to: https://user:password@hostname/repository But for various reasons I would prefer not to go this route. Are there any other ways in which I could bypass this problem?

    Read the article

< Previous Page | 353 354 355 356 357 358 359 360 361 362 363 364  | Next Page >