Search Results

Search found 32 results on 2 pages for 'etree'.

Page 1/2 | 1 2  | Next Page >

  • how to pass an xml file to lxml to parse?

    - by BeeBand
    I'm trying to parse an xml file using lxml. xml.etree allowed me to simply pass the file name as a parameter to the parse function, so I attempted to do the same with lxml. My code: from lxml import etree from lxml import objectify file = "C:\Projects\python\cb.xml" tree = etree.parse(file) but I get the error: Traceback (most recent call last): File "cb.py", line 5, in <module> tree = etree.parse(file) File "lxml.etree.pyx", line 2698, in lxml.etree.parse (src/lxml/lxml.etree.c:4 9590) File "parser.pxi", line 1491, in lxml.etree._parseDocument (src/lxml/lxml.etre e.c:71205) File "parser.pxi", line 1520, in lxml.etree._parseDocumentFromURL (src/lxml/lx ml.etree.c:71488) File "parser.pxi", line 1420, in lxml.etree._parseDocFromFile (src/lxml/lxml.e tree.c:70583) File "parser.pxi", line 975, in lxml.etree._BaseParser._parseDocFromFile (src/ lxml/lxml.etree.c:67736) File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDo c (src/lxml/lxml.etree.c:63820) File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.e tree.c:64741) File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etr ee.c:64084) lxml.etree.XMLSyntaxError: AttValue: " or ' expected, line 2, column 26 What am I doing wrong?

    Read the article

  • etree.findall: 'OR'-lookup?

    - by piquadrat
    I want to find all stylesheet definitions in a XHTML file with lxml.etree.findall. This could be as simple as elems = tree.findall('link[@rel="stylesheet"]') + tree.findall('style') But the problem with CSS style definitions is that the order matters, e.g. <link rel="stylesheet" type="text/css" href="/media/css/first.css" /> <style>body:{font-size: 10px;}</style> <link rel="stylesheet" type="text/css" href="/media/css/second.css" /> if the contents of the style tag is applied after the rules in the two link tags, the result may be completely different from the one where the rules are applied in order of definition. So, how would I do a lookup that inlcudes both link[@rel="stylesheet"] and style?

    Read the article

  • Python question, how to pass an xml file to lxml to parse?

    - by BeeBand
    I'm relatively new to python, my code is: from lxml import etree from lxml import objectify file = "C:\Projects\python\cb.xml" tree = etree.parse(file) but I get the error: Traceback (most recent call last): File "cb.py", line 5, in <module> tree = etree.parse(file) File "lxml.etree.pyx", line 2698, in lxml.etree.parse (src/lxml/lxml.etree.c:4 9590) File "parser.pxi", line 1491, in lxml.etree._parseDocument (src/lxml/lxml.etre e.c:71205) File "parser.pxi", line 1520, in lxml.etree._parseDocumentFromURL (src/lxml/lx ml.etree.c:71488) File "parser.pxi", line 1420, in lxml.etree._parseDocFromFile (src/lxml/lxml.e tree.c:70583) File "parser.pxi", line 975, in lxml.etree._BaseParser._parseDocFromFile (src/ lxml/lxml.etree.c:67736) File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDo c (src/lxml/lxml.etree.c:63820) File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.e tree.c:64741) File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etr ee.c:64084) lxml.etree.XMLSyntaxError: AttValue: " or ' expected, line 2, column 26 What am I doing wrong?

    Read the article

  • Parsing XML using xml.etree.cElementTree

    - by Andre
    I have the following XML in a string named 'xml': <?xml version="1.0" encoding="ISO-8859-1"?> <Book> <Page> <Text>Blah</Text> </Page> </Book> I'm trying to get the value Blah out of it but I'm having trouble with xml.etree.cElementTree. I've tried the find() and findtext() methods but nothing. Eventually I did this: import xml.etree.cElementTree as ET ... root = ET.fromstring(xml) element = root.getchildren()[0].getchildren()[0] Element now equals the element, which is what I want (for this solution anyway), but how do I get the inner text from it? element.text does not work. Any ideas? PS: I am using Python 2.5 atm. As an extra question: what is a better way to parse xml strings in python?

    Read the article

  • Merging elements inside a xml.etree.ElementTree

    - by theAlse
    I have a huge test data like the one provided below (and yes I have no control over this data). Each line is actually 6 parts and I need to generate an XML based on this data. Nav;Basic;Dest;Smoke;No;Yes; Nav;Dest;Recent;Regg;No;Yes; Nav;Dest;Favourites;Regg;No;Yes; ... Nav;Dest using on board;By POI;Smoke;No;Yes; Nav;Dest using on board;Other;Regg;No;Yes; The first 3 elements on each line denotes "test suites"-XML element and the last 3 element should create a "test case"-XML element. I have successfully converted it into a XML using the following code: # testsuite (root) testsuite = ET.Element('testsuite') testsuite.set("name", "Tests") def _create_testcase_tag(elem): global testsuite level1, level2, level3, elem4, elem5, elem6 = elem # -- testsuite (level1) testsuite_level1 = ET.SubElement(testsuite, "testsuite") testsuite_level1.set("name", level1) # -- testsuite (level2) testsuite_level2 = ET.SubElement(testsuite_level1, "testsuite") testsuite_level2.set("name", level2) # -- testsuite (level3) testsuite_level2 = ET.SubElement(testsuite_level2, "testsuite") testsuite_level2.set("name", level3) # -- testcase testcase = ET.SubElement(testsuite_level2, "testcase") testcase.set("name", "TBD") summary = ET.SubElement(testcase, "summary") summary.text = "Test Type= %s, Automated= %s, Available=%s" %(elem4, elem5, elem6) with open(input_file) as in_file: for line_number, a_line in enumerate(in_file): try: parameters = a_line.split(';') if len(parameters) >= 6: level1 = parameters[0].strip() level2 = parameters[1].strip() level3 = parameters[2].strip() elem4 = parameters[3].strip() elem5 = parameters[4].strip() elem6 = parameters[5].strip() lines_as_list.append((level1, level2, level3, elem4, elem5, elem6)) except ValueError: pass lines_as_list.sort() for elem in lines_as_list: _create_testcase_tag(elem) output_xml = ET.ElementTree(testsuite) ET.ElementTree.write(output_xml, output_file, xml_declaration=True, encoding="UTF-8") The above code generates an XML like this: <testsuite name="Tests"> <testsuite name="Nav"> <testsuite name="Basic navigation"> <testsuite name="Set destination"> <testcase name="TBD"> <summary>Test Type= Smoke test Automated= No, Available=Yes</summary> </testcase> </testsuite> </testsuite> </testsuite> <testsuite name="Nav"> <testsuite name="Set destination"> <testsuite name="Recent"> <testcase name="TBD"> <summary> Test Type= Reggression test Automated= No, Available=Yes </summary> </testcase> </testsuite> </testsuite> </testsuite> </testsuite> ... This is all correct, but as you can see I have created a whole tree for each line and that is not what I need. I need to combine e.g. all testsuite with the same name into one testsuite and also perform that recursively. So the XML looks like this instead: <testsuite name="Tests"> <testsuite name="Nav"> <testsuite name="Basic navigation"> <testsuite name="Set destination"> <testcase name="TBD"> <summary>Test Type= Smoke test Automated= No, Available=Yes</summary> </testcase> </testsuite> <testsuite name="Recent"> <testcase name="TBD"> <summary> Test Type= Reggression test Automated= No, Available=Yes </summary> </testcase> </testsuite> </testsuite> </testsuite> </testsuite> I hope you can understand what I mean, but level1, level2 and level3 should be unique with testcases inside. How should I do this? Please do not suggest the use of any external libraries! I can not install new libraries in customer site. xml.etree.ElementTree is all I have. Thanks

    Read the article

  • Encoding in python with lxml - complex solution

    - by Vojtech R.
    Hi, I need to download and parse webpage with lxml and build UTF-8 xml output. I thing schema in pseudocode is more illustrative: from lxml import etree webfile = urllib2.urlopen(url) root = etree.parse(webfile.read(), parser=etree.HTMLParser(recover=True)) txt = my_process_text(etree.tostring(root.xpath('/html/body'), encoding=utf8)) output = etree.Element("out") output.text = txt outputfile.write(etree.tostring(output, encoding=utf8)) So webfile can be in any encoding (lxml should handle this). Outputfile have to be in utf-8. I'm not sure where to use encoding/coding. Is this schema ok? (I cant find good tutorial about lxml and encoding, but I can find many problems with this...) I need robust approved solution so I ask you seniors. Many thanks

    Read the article

  • How do I require that an element has either one set of attributes or another in an XSD schema?

    - by Eli Courtwright
    I'm working with an XML document where a tag must either have one set of attributes or another. For example, it needs to either look like <tag foo="hello" bar="kitty" /> or <tag spam="goodbye" eggs="world" /> e.g. <root> <tag foo="hello" bar="kitty" /> <tag spam="goodbye" eggs="world" /> </root> So I have an XSD schema where I use the xs:choice element to choose between two different attribute groups: <xsi:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema" attributeFormDefault="unqualified" elementFormDefault="qualified"> <xs:element name="root"> <xs:complexType> <xs:sequence> <xs:element maxOccurs="unbounded" name="tag"> <xs:choice> <xs:complexType> <xs:attribute name="foo" type="xs:string" use="required" /> <xs:attribute name="bar" type="xs:string" use="required" /> </xs:complexType> <xs:complexType> <xs:attribute name="spam" type="xs:string" use="required" /> <xs:attribute name="eggs" type="xs:string" use="required" /> </xs:complexType> </xs:choice> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xsi:schema> However, when using lxml to attempt to load this schema, I get the following error: >>> from lxml import etree >>> etree.XMLSchema( etree.parse("schema_choice.xsd") ) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "xmlschema.pxi", line 85, in lxml.etree.XMLSchema.__init__ (src/lxml/lxml.etree.c:118685) lxml.etree.XMLSchemaParseError: Element '{http://www.w3.org/2001/XMLSchema}element': The content is not valid. Expected is (annotation?, ((simpleType | complexType)?, (unique | key | keyref)*))., line 7 Since the error is with the placement of my xs:choice element, I've tried putting it in different places, but no matter what I try, I can't seem to use it to define a tag to have either one set of attributes (foo and bar) or another (spam and eggs). Is this even possible? And if so, then what is the correct syntax?

    Read the article

  • Write xml file using lxml library in Python

    - by systempuntoout
    I'm using lxml to create an XML file from scratch; having a code like this: from lxml import etree root = etree.Element("root") root.set("interesting", "somewhat") child1 = etree.SubElement(root, "test") How do i write root Element object to an xml file using write() method of ElementTree class?

    Read the article

  • Write xml file with lxml

    - by systempuntoout
    Having a code like this: from lxml import etree root = etree.Element("root") root.set("interesting", "somewhat") child1 = etree.SubElement(root, "test") How do i write root Element object to an xml file using write() method of ElementTree class?

    Read the article

  • Default or fink python and lxml under 10.6.8

    - by songdogtech
    Ah, confusion. I'm trying to install a python library called lxml as needed by a python script. I've been through numerous SU quesitons and answers. I haven't been able to make much progress. I run easy_install lxml and get: Processing lxml-3.0.1-py2.6-macosx-10.6-universal.egg lxml 3.0.1 is already the active version in easy-install.pth Using /Library/Python/2.6/site-packages/lxml-3.0.1-py2.6-macosx-10.6-universal.egg Processing dependencies for lxml Finished processing dependencies for lxml but when I run my script, I get: File "scraper.py", line 3, in import lxml.html File "/Library/Python/2.6/site-packages/lxml-3.0.1-py2.6-macosx-10.6-universal.egg/lxml/html/init.py", line 42, in from lxml import etree ImportError: dlopen(/Library/Python/2.6/site-packages/lxml-3.0.1-py2.6-macosx-10.6-universal.egg/lxml/etree.so, 2): Symbol not found: _htmlParseChunk Referenced from: /Library/Python/2.6/site-packages/lxml-3.0.1-py2.6-macosx-10.6-universal.egg/lxml/etree.so Expected in: flat namespace in /Library/Python/2.6/site-packages/lxml-3.0.1-py2.6-macosx-10.6-universal.egg/lxml/etree.so I think that maybe I'm not using the correct python install? I've installed python with fink, but should I use OS X's python? This is in my .profile: test -r /sw/bin/init.sh && . /sw/bin/init.sh which points to the fink install. echo $PATH gives me: /sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin Should I change that to point to snow leopard's python? (Which is 2.6.1) In Library/, there is: which are the lxml libaries I need, it appears, as well as requests. And whereis python gives me /usr/bin/python What do I do? How do I get python to use these libraries. And which python?

    Read the article

  • Why am I getting a " instance has no attribute '__getitem__' " error?

    - by Kevin Yusko
    Here's the code: class BinaryTree: def __init__(self,rootObj): self.key = rootObj self.left = None self.right = None root = [self.key, self.left, self.right] def getRootVal(root): return root[0] def setRootVal(newVal): root[0] = newVal def getLeftChild(root): return root[1] def getRightChild(root): return root[2] def insertLeft(self,newNode): if self.left == None: self.left = BinaryTree(newNode) else: t = BinaryTree(newNode) t.left = self.left self.left = t def insertRight(self,newNode): if self.right == None: self.right = BinaryTree(newNode) else: t = BinaryTree(newNode) t.right = self.right self.right = t def buildParseTree(fpexp): fplist = fpexp.split() pStack = Stack() eTree = BinaryTree('') pStack.push(eTree) currentTree = eTree for i in fplist: if i == '(': currentTree.insertLeft('') pStack.push(currentTree) currentTree = currentTree.getLeftChild() elif i not in '+-*/)': currentTree.setRootVal(eval(i)) parent = pStack.pop() currentTree = parent elif i in '+-*/': currentTree.setRootVal(i) currentTree.insertRight('') pStack.push(currentTree) currentTree = currentTree.getRightChild() elif i == ')': currentTree = pStack.pop() else: print "error: I don't recognize " + i return eTree def postorder(tree): if tree != None: postorder(tree.getLeftChild()) postorder(tree.getRightChild()) print tree.getRootVal() def preorder(self): print self.key if self.left: self.left.preorder() if self.right: self.right.preorder() def inorder(tree): if tree != None: inorder(tree.getLeftChild()) print tree.getRootVal() inorder(tree.getRightChild()) class Stack: def __init__(self): self.items = [] def isEmpty(self): return self.items == [] def push(self, item): self.items.append(item) def pop(self): return self.items.pop() def peek(self): return self.items[len(self.items)-1] def size(self): return len(self.items) def main(): parseData = raw_input( "Please enter the problem you wished parsed.(NOTE: problem must have parenthesis to seperate each binary grouping and must be spaced out.) " ) tree = buildParseTree(parseData) print( "The post order is: ", + postorder(tree)) print( "The post order is: ", + postorder(tree)) print( "The post order is: ", + preorder(tree)) print( "The post order is: ", + inorder(tree)) main() And here is the error: Please enter the problem you wished parsed.(NOTE: problem must have parenthesis to seperate each binary grouping and must be spaced out.) ( 1 + 2 ) Traceback (most recent call last): File "C:\Users\Kevin\Desktop\Python Stuff\Assignment 11\parseTree.py", line 108, in main() File "C:\Users\Kevin\Desktop\Python Stuff\Assignment 11\parseTree.py", line 102, in main tree = buildParseTree(parseData) File "C:\Users\Kevin\Desktop\Python Stuff\Assignment 11\parseTree.py", line 46, in buildParseTree currentTree = currentTree.getLeftChild() File "C:\Users\Kevin\Desktop\Python Stuff\Assignment 11\parseTree.py", line 15, in getLeftChild return root[1] AttributeError: BinaryTree instance has no attribute '__getitem__'

    Read the article

  • Python ImportError when executing 'import.py', but not when executing 'python import.py'

    - by Martin Del Vecchio
    I am running Cygwin Python version 2.5.2. I have a three-line source file, called import.py: #!/usr/bin/python import xml.etree.ElementTree as ET print "Success!" When I execute "python import.py", it works: C:\Temp>python import.py Success! When I run the python interpreter and type the commands, it works: C:\Temp>python Python 2.5.2 (r252:60911, Dec 2 2008, 09:26:14) [GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin Type "help", "copyright", "credits" or "license" for more information. >>> #!/usr/bin/python ... import xml.etree.ElementTree as ET >>> print "Success!" Success! >>> But when I execute "import.py', it does not work: C:\Temp>which python /usr/bin/python C:\Temp>import.py Traceback (most recent call last): File "C:\Temp\import.py", line 2, in ? import xml.etree.ElementTree as ET ImportError: No module named etree.ElementTree When I remove the first line (#!/usr/bin/python), I get the same error. I need that line in there, though, for when this script runs on Linux. And it works fine on Linux. Any ideas? Thanks.

    Read the article

  • Confused as to use a class or a function: Writing XML files using lxml and Python

    - by PulpFiction
    Hi. I need to write XML files using lxml and Python. However, I can't figure out whether to use a class to do this or a function. The point being, this is the first time I am developing a proper software and deciding where and why to use a class still seems mysterious. I will illustrate my point. For example, consider the following function based code I wrote for adding a subelement to a etree root. from lxml import etree root = etree.Element('document') def createSubElement(text, tagText = ""): etree.SubElement(root, text) # How do I do this: element.text = tagText createSubElement('firstChild') createSubElement('SecondChild') As expected, the output of this is: <document> <firstChild/> <SecondChild/> </document> However as you can notice the comment, I have no idea how to do set the text variable using this approach. Is using a class the only way to solve this? And if yes, can you give me some pointers on how to achieve this?

    Read the article

  • python lxml problem

    - by David ???
    I'm trying to print/save a certain element's HTML from a web-page. I've retrieved the requested element's XPath from firebug. All I wish is to save this element to a file. I don't seem to succeed in doing so. (tried the XPath with and without a /text() at the end) I would appreciate any help, or past experience. 10x, David import urllib2,StringIO from lxml import etree url='http://www.tutiempo.net/en/Climate/Londres_Heathrow_Airport/12-2009/37720.htm' seite = urllib2.urlopen(url) html = seite.read() seite.close() parser = etree.HTMLParser() tree = etree.parse(StringIO.StringIO(html), parser) xpath = "/html/body/table/tbody/tr/td[2]/div/table/tbody/tr[6]/td/table/tbody/tr/td[3]/table/tbody/tr[3]/td/table/tbody/tr/td/table/tbody/tr/td/table/tbody/text()" elem = tree.xpath(xpath) print elem[0].strip().encode("utf-8")

    Read the article

  • Close a tag with no text in lxml

    - by PulpFiction
    I am trying to output a XML file using Python and lxml However, I notice one thing that if a tag has no text, it does not close itself. An example of this would be: root = etree.Element('document') rootTree = etree.ElementTree(root) firstChild = etree.SubElement(root, 'test') The output of this is: <document> <test/> </document I want the output to be: <document> <test> </test> </document> So basically I want to close a tag which has no text, but is used to the attribute value. How do I do that? And also, what is such a tag called? I would have Googled it, but I don't know how to search for it.

    Read the article

  • Regular expression works normally, but fails when placed in an XML schema

    - by Eli Courtwright
    I have a simple doc.xml file which contains a single root element with a Timestamp attribute: <?xml version="1.0" encoding="utf-8"?> <root Timestamp="04-21-2010 16:00:19.000" /> I'd like to validate this document against a my simple schema.xsd to make sure that the Timestamp is in the correct format: <?xml version="1.0" encoding="utf-8"?> <xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="root"> <xs:complexType> <xs:attribute name="Timestamp" use="required" type="timeStampType"/> </xs:complexType> </xs:element> <xs:simpleType name="timeStampType"> <xs:restriction base="xs:string"> <xs:pattern value="(0[0-9]{1})|(1[0-2]{1})-(3[0-1]{1}|[0-2]{1}[0-9]{1})-[2-9]{1}[0-9]{3} ([0-1]{1}[0-9]{1}|2[0-3]{1}):[0-5]{1}[0-9]{1}:[0-5]{1}[0-9]{1}.[0-9]{3}" /> </xs:restriction> </xs:simpleType> </xs:schema> So I use the lxml Python module and try to perform a simple schema validation and report any errors: from lxml import etree schema = etree.XMLSchema( etree.parse("schema.xsd") ) doc = etree.parse("doc.xml") if not schema.validate(doc): for e in schema.error_log: print e.message My XML document fails validation with the following error messages: Element 'root', attribute 'Timestamp': [facet 'pattern'] The value '04-21-2010 16:00:19.000' is not accepted by the pattern '(0[0-9]{1})|(1[0-2]{1})-(3[0-1]{1}|[0-2]{1}[0-9]{1})-[2-9]{1}[0-9]{3} ([0-1]{1}[0-9]{1}|2[0-3]{1}):[0-5]{1}[0-9]{1}:[0-5]{1}[0-9]{1}.[0-9]{3}'. Element 'root', attribute 'Timestamp': '04-21-2010 16:00:19.000' is not a valid value of the atomic type 'timeStampType'. So it looks like my regular expression must be faulty. But when I try to validate the regular expression at the command line, it passes: >>> import re >>> pat = '(0[0-9]{1})|(1[0-2]{1})-(3[0-1]{1}|[0-2]{1}[0-9]{1})-[2-9]{1}[0-9]{3} ([0-1]{1}[0-9]{1}|2[0-3]{1}):[0-5]{1}[0-9]{1}:[0-5]{1}[0-9]{1}.[0-9]{3}' >>> assert re.match(pat, '04-21-2010 16:00:19.000') >>> I'm aware that XSD regular expressions don't have every feature, but the documentation I've found indicates that every feature that I'm using should work. So what am I mis-understanding, and why does my document fail?

    Read the article

  • lxml unicode entity parse problems

    - by Jon Hadley
    I'm using lxml as follows to parse an exported XML file from another system: xmldoc = open(filename) etree.parse(xmldoc) But im getting: lxml.etree.XMLSyntaxError: Entity 'eacute' not defined, line 4495, column 46 Obviously it's having problems with unicode entity names - but how would i get round this? Via open() or parse()?

    Read the article

  • parse xml with elementtree, custom sorting

    - by microspace
    I want to parse xml file in utf-8 and sort it by some field. Soring is made by custom alphabet (s1 from sourcecode). History of question is here: sorting of list containing utf-8 charachters. I've found how to sort xml here. Sorting work correctly, the problem is with elementtree, I must admit that it doesn't work on python3 Here is source code: #!/usr/bin/env python # -*- coding: utf-8 -*- #import xml.etree.ElementTree as ET # Python 2.5 import elementtree.ElementTree as ET s1='aáàAâÂbBcCçÇdDeéEfFgGgGhHiIîÎíiiIjJkKlLmMnNóoOöÖpPqQrRsSsStTuUûúÛüÜvVwWxXyYzZ' s2='11111122334455666aabbccddeeeeeeffgghhiijjkklllllmmnnooppqqrrsssssttuuvvwwxxyy' trans = str.maketrans(s1, s2) def unikey(seq): return seq[0].translate(trans) tree = ET.parse("tosort.xml") container = tree.find("entries") data = [] for elem in container: keyd = elem.findtext("k") data.append((keyd, elem)) print (data) data.sort(key=unikey) print (data) container[:] = [item[-1] for item in data] tree.write("sorted.xml", encoding="utf-8") Here are instructions to import elementtree module. When I import module this way :import xml.etree.ElementTree as ET, I get a message: Traceback (most recent call last): File "pcs.py", line 19, in <module> container[:] = [item[-1] for item in data] File "/usr/lib/python3.1/xml/etree/ElementTree.py", line 210, in __setitem__ assert iselement(element) AssertionError When I use this method to import: import elementtree.ElementTree as ET, I get this message: Traceback (most recent call last): File "pcs.py", line 4, in <module> import elementtree.ElementTree as ET File "/usr/local/lib/python3.1/dist-packages/elementtree/ElementTree.py", line 794, in <module> _escape = re.compile(eval(r'u"[&<>\"\u0080-\uffff]+"')) File "<string>", line 1 u"[&<>\"\u0080-\uffff]+" ^ SyntaxError: invalid syntax I use Python 3.1.3 (r313:86834, Nov 28 2010, 11:28:10). In python2.6 elementtree work without a problem. Content of tosort.xml: <xdxf> <entries> <ar><k>zaaaa</k>definition1</ar> <ar><k>saaaa</k>definition2</ar> ... ... </entries> </xdxf>

    Read the article

  • Creating Python C module from Fortran sources on Ubuntu 10.04 LTS

    - by Botondus
    In a project I work on we use a Python C module compiled from Fortran with f2py. I've had no issues building it on Windows 7 32bit (using mingw32) and on the servers it's built on 32bit Linux. But I've recently installed Ubuntu 10.04 LTS 64bit on my laptop that I use for development, and when I build it I get a lot of warnings (even though I've apparently installed all gcc/fortran libraries/compilers), but it does finish the build. However when I try to use the built module in the application, most of it seems to run well but then it crashes with an error: * glibc detected /home/botondus/Envs/gasit/bin/python: free(): invalid next size (fast): 0x0000000006a44760 ** Warnings on running *f2py -c -m module_name ./fortran/source.f90* customize UnixCCompiler customize UnixCCompiler using build_ext customize GnuFCompiler Could not locate executable g77 Found executable /usr/bin/f77 gnu: no Fortran 90 compiler found gnu: no Fortran 90 compiler found customize IntelFCompiler Could not locate executable ifort Could not locate executable ifc customize LaheyFCompiler Could not locate executable lf95 customize PGroupFCompiler Could not locate executable pgf90 Could not locate executable pgf77 customize AbsoftFCompiler Could not locate executable f90 absoft: no Fortran 90 compiler found absoft: no Fortran 90 compiler found absoft: no Fortran 90 compiler found absoft: no Fortran 90 compiler found absoft: no Fortran 90 compiler found absoft: no Fortran 90 compiler found customize NAGFCompiler Found executable /usr/bin/f95 customize VastFCompiler customize GnuFCompiler gnu: no Fortran 90 compiler found gnu: no Fortran 90 compiler found customize CompaqFCompiler Could not locate executable fort customize IntelItaniumFCompiler Could not locate executable efort Could not locate executable efc customize IntelEM64TFCompiler customize Gnu95FCompiler Found executable /usr/bin/gfortran customize Gnu95FCompiler customize Gnu95FCompiler using build_ext I have tried building a 32bit version by installing the gfortran multilib packages and running f2py with -m32 option (but with no success): f2py -c -m module_name ./fortran/source.f90 --f77flags="-m32" --f90flags="-m32" Any suggestions on what I could try to either build 32bit version or correctly build the 64bit version? Edit: It looks like it crashes right at the end of a subroutine. The 'write' executes fine... which is strange. write(6,*)'Eh=',Eh end subroutine calcolo_involucro The full backtrace is very long and I'm not sure if it's any help, but here it is: *** glibc detected *** /home/botondus/Envs/gasit/bin/python: free(): invalid next size (fast): 0x0000000007884690 *** ======= Backtrace: ========= /lib/libc.so.6(+0x775b6)[0x7fe24f8f05b6] /lib/libc.so.6(cfree+0x73)[0x7fe24f8f6e53] /usr/local/lib/python2.6/dist-packages/numpy/core/multiarray.so(+0x4183c)[0x7fe24a18183c] /home/botondus/Envs/gasit/bin/python[0x46a50d] /usr/local/lib/python2.6/dist-packages/numpy/core/multiarray.so(+0x4fbd8)[0x7fe24a18fbd8] /usr/local/lib/python2.6/dist-packages/numpy/core/multiarray.so(+0x5aded)[0x7fe24a19aded] /home/botondus/Envs/gasit/bin/python(PyEval_EvalFrameEx+0x516e)[0x4a7c5e] /home/botondus/Envs/gasit/bin/python(PyEval_EvalFrameEx+0x5a60)[0x4a8550] /home/botondus/Envs/gasit/bin/python(PyEval_EvalCodeEx+0x911)[0x4a9671] /home/botondus/Envs/gasit/bin/python[0x537620] /home/botondus/Envs/gasit/bin/python(PyObject_Call+0x47)[0x41f0c7] /home/botondus/Envs/gasit/bin/python[0x427dff] /home/botondus/Envs/gasit/bin/python(PyObject_Call+0x47)[0x41f0c7] /home/botondus/Envs/gasit/bin/python[0x477bff] /home/botondus/Envs/gasit/bin/python[0x46f47f] /home/botondus/Envs/gasit/bin/python(PyObject_Call+0x47)[0x41f0c7] /home/botondus/Envs/gasit/bin/python(PyEval_EvalFrameEx+0x4888)[0x4a7378] /home/botondus/Envs/gasit/bin/python(PyEval_EvalCodeEx+0x911)[0x4a9671] /home/botondus/Envs/gasit/bin/python(PyEval_EvalFrameEx+0x4d19)[0x4a7809] /home/botondus/Envs/gasit/bin/python(PyEval_EvalCodeEx+0x911)[0x4a9671] /home/botondus/Envs/gasit/bin/python(PyEval_EvalFrameEx+0x4d19)[0x4a7809] /home/botondus/Envs/gasit/bin/python(PyEval_EvalCodeEx+0x911)[0x4a9671] /home/botondus/Envs/gasit/bin/python[0x537620] /home/botondus/Envs/gasit/bin/python(PyObject_Call+0x47)[0x41f0c7] /home/botondus/Envs/gasit/bin/python(PyEval_CallObjectWithKeywords+0x43)[0x4a1b03] /usr/local/lib/python2.6/dist-packages/numpy/core/multiarray.so(+0x2ee94)[0x7fe24a16ee94] /home/botondus/Envs/gasit/bin/python(_PyObject_Str+0x61)[0x454a81] /home/botondus/Envs/gasit/bin/python(PyObject_Str+0xa)[0x454b3a] /home/botondus/Envs/gasit/bin/python[0x461ad3] /home/botondus/Envs/gasit/bin/python[0x46f3b3] /home/botondus/Envs/gasit/bin/python(PyObject_Call+0x47)[0x41f0c7] /home/botondus/Envs/gasit/bin/python(PyEval_EvalFrameEx+0x4888)[0x4a7378] /home/botondus/Envs/gasit/bin/python(PyEval_EvalCodeEx+0x911)[0x4a9671] /home/botondus/Envs/gasit/bin/python(PyEval_EvalFrameEx+0x4d19)[0x4a7809] /home/botondus/Envs/gasit/bin/python(PyEval_EvalFrameEx+0x5a60)[0x4a8550] ======= Memory map: ======== 00400000-0061c000 r-xp 00000000 08:05 399145 /home/botondus/Envs/gasit/bin/python 0081b000-0081c000 r--p 0021b000 08:05 399145 /home/botondus/Envs/gasit/bin/python 0081c000-0087e000 rw-p 0021c000 08:05 399145 /home/botondus/Envs/gasit/bin/python 0087e000-0088d000 rw-p 00000000 00:00 0 01877000-07a83000 rw-p 00000000 00:00 0 [heap] 7fe240000000-7fe240021000 rw-p 00000000 00:00 0 7fe240021000-7fe244000000 ---p 00000000 00:00 0 7fe247631000-7fe2476b1000 r-xp 00000000 08:03 140646 /usr/lib/libfreetype.so.6.3.22 7fe2476b1000-7fe2478b1000 ---p 00080000 08:03 140646 /usr/lib/libfreetype.so.6.3.22 7fe2478b1000-7fe2478b6000 r--p 00080000 08:03 140646 /usr/lib/libfreetype.so.6.3.22 7fe2478b6000-7fe2478b7000 rw-p 00085000 08:03 140646 /usr/lib/libfreetype.so.6.3.22 7fe2478b7000-7fe2478bb000 r-xp 00000000 08:03 263882 /usr/lib/python2.6/dist-packages/PIL/_imagingft.so 7fe2478bb000-7fe247aba000 ---p 00004000 08:03 263882 /usr/lib/python2.6/dist-packages/PIL/_imagingft.so 7fe247aba000-7fe247abb000 r--p 00003000 08:03 263882 /usr/lib/python2.6/dist-packages/PIL/_imagingft.so 7fe247abb000-7fe247abc000 rw-p 00004000 08:03 263882 /usr/lib/python2.6/dist-packages/PIL/_imagingft.so 7fe247abc000-7fe247abf000 r-xp 00000000 08:03 266773 /usr/lib/python2.6/lib-dynload/_bytesio.so 7fe247abf000-7fe247cbf000 ---p 00003000 08:03 266773 /usr/lib/python2.6/lib-dynload/_bytesio.so 7fe247cbf000-7fe247cc0000 r--p 00003000 08:03 266773 /usr/lib/python2.6/lib-dynload/_bytesio.so 7fe247cc0000-7fe247cc1000 rw-p 00004000 08:03 266773 /usr/lib/python2.6/lib-dynload/_bytesio.so 7fe247cc1000-7fe247cc5000 r-xp 00000000 08:03 266786 /usr/lib/python2.6/lib-dynload/_fileio.so 7fe247cc5000-7fe247ec4000 ---p 00004000 08:03 266786 /usr/lib/python2.6/lib-dynload/_fileio.so 7fe247ec4000-7fe247ec5000 r--p 00003000 08:03 266786 /usr/lib/python2.6/lib-dynload/_fileio.so 7fe247ec5000-7fe247ec6000 rw-p 00004000 08:03 266786 /usr/lib/python2.6/lib-dynload/_fileio.so 7fe247ec6000-7fe24800c000 r-xp 00000000 08:03 141358 /usr/lib/libxml2.so.2.7.6 7fe24800c000-7fe24820b000 ---p 00146000 08:03 141358 /usr/lib/libxml2.so.2.7.6 7fe24820b000-7fe248213000 r--p 00145000 08:03 141358 /usr/lib/libxml2.so.2.7.6 7fe248213000-7fe248215000 rw-p 0014d000 08:03 141358 /usr/lib/libxml2.so.2.7.6 7fe248215000-7fe248216000 rw-p 00000000 00:00 0 7fe248216000-7fe248229000 r-xp 00000000 08:03 140632 /usr/lib/libexslt.so.0.8.15 7fe248229000-7fe248428000 ---p 00013000 08:03 140632 /usr/lib/libexslt.so.0.8.15 7fe248428000-7fe248429000 r--p 00012000 08:03 140632 /usr/lib/libexslt.so.0.8.15 7fe248429000-7fe24842a000 rw-p 00013000 08:03 140632 /usr/lib/libexslt.so.0.8.15 7fe24842a000-7fe248464000 r-xp 00000000 08:03 141360 /usr/lib/libxslt.so.1.1.26 7fe248464000-7fe248663000 ---p 0003a000 08:03 141360 /usr/lib/libxslt.so.1.1.26 7fe248663000-7fe248664000 r--p 00039000 08:03 141360 /usr/lib/libxslt.so.1.1.26 7fe248664000-7fe248665000 rw-p 0003a000 08:03 141360 /usr/lib/libxslt.so.1.1.26 7fe248665000-7fe24876e000 r-xp 00000000 08:03 534240 /usr/local/lib/python2.6/dist-packages/lxml/etree.so 7fe24876e000-7fe24896d000 ---p 00109000 08:03 534240 /usr/local/lib/python2.6/dist-packages/lxml/etree.so 7fe24896d000-7fe24896e000 r--p 00108000 08:03 534240 /usr/local/lib/python2.6/dist-packages/lxml/etree.so 7fe24896e000-7fe248999000 rw-p 00109000 08:03 534240 /usr/local/lib/python2.6/dist-packages/lxml/etree.so 7fe248999000-7fe2489a7000 rw-p 00000000 00:00 0 7fe2489a7000-7fe2489bd000 r-xp 00000000 08:03 132934 /lib/libgcc_s.so.1

    Read the article

  • Parse html and find data in the html

    - by Dan.StackOverflow
    Hi all. I am trying to use html5lib to parse an html page in to something I can query with xpath. html5lib has close to zero documentation and I've spent too much time trying to figure this problem out. Ultimate goal is to pull out the second row of a table: <html> <table> <tr><td>Header</td></tr> <tr><td>Want This</td></tr> </table> </html> so lets try it: >>> doc = html5lib.parse('<html><table><tr><td>Header</td></tr><tr><td>Want This</td> </tr></table></html>', treebuilder='lxml') >>> doc <lxml.etree._ElementTree object at 0x1a1c290> that looks good, lets see what else we have: >>> root = doc.getroot() >>> print(lxml.etree.tostring(root)) <html:html xmlns:html="http://www.w3.org/1999/xhtml"><html:head/><html:body><html:table><html:tbody><html:tr><html:td>Header</html:td></html:tr><html:tr><html:td>Want This</html:td></html:tr></html:tbody></html:table></html:body></html:html> LOL WUT? seriously. I was planning on using some xpath to get at the data I want, but that doesn't seem to work. So what can I do? I am willing to try different libraries and approaches.

    Read the article

  • Yahoo BOSS Python Library, ExpatError

    - by Wraith
    I tried to install the Yahoo BOSS mashup framework, but am having trouble running the examples provided. Examples 1, 2, 5, and 6 work, but 3 & 4 give Expat errors. Here is the output from ex3.py: gpython examples/ex3.py examples/ex3.py:33: Warning: 'as' will become a reserved keyword in Python 2.6 Traceback (most recent call last): File "examples/ex3.py", line 27, in <module> digg = db.select(name="dg", udf=titlef, url="http://digg.com/rss_search?search=google+android&area=dig&type=both&section=news") File "/usr/lib/python2.5/site-packages/yos/yql/db.py", line 214, in select tb = create(name, data=data, url=url, keep_standards_prefix=keep_standards_prefix) File "/usr/lib/python2.5/site-packages/yos/yql/db.py", line 201, in create return WebTable(name, d=rest.load(url), keep_standards_prefix=keep_standards_prefix) File "/usr/lib/python2.5/site-packages/yos/crawl/rest.py", line 38, in load return xml2dict.fromstring(dl) File "/usr/lib/python2.5/site-packages/yos/crawl/xml2dict.py", line 41, in fromstring t = ET.fromstring(s) File "/usr/lib/python2.5/xml/etree/ElementTree.py", line 963, in XML parser.feed(text) File "/usr/lib/python2.5/xml/etree/ElementTree.py", line 1245, in feed self._parser.Parse(data, 0) xml.parsers.expat.ExpatError: syntax error: line 1, column 0 It looks like both examples are failing when trying to query Digg.com. Here is the query that is constructed in ex3.py's code: diggf = lambda r: {"title": r["title"]["value"], "diggs": int(r["diggCount"]["value"])} digg = db.select(name="dg", udf=diggf, url="http://digg.com/rss_search?search=google+android&area=dig&type=both&section=news") Any help is appreciated. Thanks!

    Read the article

  • Python web scraping involving HTML tags with attributes

    - by rohanbk
    I'm trying to make a web scraper that will parse a web-page of publications and extract the authors. The skeletal structure of the web-page is the following: <html> <body> <div id="container"> <div id="contents"> <table> <tbody> <tr> <td class="author">####I want whatever is located here ###</td> </tr> </tbody> </table> </div> </div> </body> </html> I've been trying to use BeautifulSoup and lxml thus far to accomplish this task, but I'm not sure how to handle the two div tags and td tag because they have attributes. In addition to this, I'm not sure whether I should rely more on BeautifulSoup or lxml or a combination of both. What should I do? At the moment, my code looks like what is below: import re import urllib2,sys import lxml from lxml import etree from lxml.html.soupparser import fromstring from lxml.etree import tostring from lxml.cssselect import CSSSelector from BeautifulSoup import BeautifulSoup, NavigableString address='http://www.example.com/' html = urllib2.urlopen(address).read() soup = BeautifulSoup(html) html=soup.prettify() html=html.replace('&nbsp', '&#160') html=html.replace('&iacute','&#237') root=fromstring(html) I realize that a lot of the import statements may be redundant, but I just copied whatever I currently had in more source file. EDIT: I suppose that I didn't make this quite clear, but I have multiple tags in page that I want to scrape.

    Read the article

  • Can't parse XML effectively using Python

    - by Harshit Sharma
    import urllib import xml.etree.ElementTree as ET def getWeather(city): #create google weather api url url = "http://www.google.com/ig/api?weather=" + urllib.quote(city) try: # open google weather api url f = urllib.urlopen(url) except: # if there was an error opening the url, return return "Error opening url" # read contents to a string s = f.read() tree=ET.parse(s) current= tree.find("current_condition/condition") condition_data = current.get("data") weather = condition_data if weather == "<?xml version=": return "Invalid city" #return the weather condition #return weather def main(): while True: city = raw_input("Give me a city: ") weather = getWeather(city) print(weather) if __name__ == "__main__": main() gives error , I actually wanted to find values from google weather xml site tags

    Read the article

1 2  | Next Page >