libxml2dom and parsing

Posted by Ockonal on Stack Overflow See other posts from Stack Overflow or by Ockonal
Published on 2010-05-15T09:59:24Z Indexed on 2010/05/15 10:04 UTC
Read the original article Hit count: 425

Filed under:
|
|

Hello, I have the html-content in some python-variable. Is it possible to use DOM for it? As I understand, libxml2dom is the tool for this. And about question. In my html there is div with id = 'some_needed_block'.

In python-script:

pageData = someHandler.read()
pageDOM = libxml2dom.parseString(pageData, html=1)
print pageDOM
-> <libxml2dom.Document object at 0x2d160d0>

block = pageDOM.getElementById('some_needed_block')
print block
-> <libxml2dom.Node object at 0xf5d1d0>

def collect_text(node):
    s = ""
    for child_node in node.childNodes:
        if child_node.nodeType == child_node.TEXT_NODE:
            s += child_node.nodeValue
        else:
            s += collect_text(child_node)
    return s

collect_text(block)

-> for child_node in node.childNodes:
-> AttributeError: 'NoneType' object has no attribute 'childNodes'

© Stack Overflow or respective owner

Related posts about python

Related posts about libxml2dom

  • libxml2dom and parsing

    as seen on Stack Overflow - Search for 'Stack Overflow'
    Hello, I have the html-content in some python-variable. Is it possible to use DOM for it? As I understand, libxml2dom is the tool for this. And about question. In my html there is div with id = 'some_needed_block'. In python-script: pageData = someHandler.read() pageDOM = libxml2dom.parseString(pageData… >>> More

  • HTML parser for GAE

    as seen on Stack Overflow - Search for 'Stack Overflow'
    Generally I use lxml for my HTML parsing needs, but that isn't available on Google App Engine. The obvious alternative is BeautifulSoup, but I find it chokes too easily on malformed HTML. Currently I am testing libxml2dom and have been getting better results. Which pure Python HTML parser have you… >>> More