libxml2dom and parsing
Posted
by Ockonal
on Stack Overflow
See other posts from Stack Overflow
or by Ockonal
Published on 2010-05-15T09:59:24Z
Indexed on
2010/05/15
10:04 UTC
Read the original article
Hit count: 425
Hello, I have the html-content in some python-variable. Is it possible to use DOM for it? As I understand, libxml2dom is the tool for this. And about question. In my html there is div with id = 'some_needed_block'.
In python-script:
pageData = someHandler.read()
pageDOM = libxml2dom.parseString(pageData, html=1)
print pageDOM
-> <libxml2dom.Document object at 0x2d160d0>
block = pageDOM.getElementById('some_needed_block')
print block
-> <libxml2dom.Node object at 0xf5d1d0>
def collect_text(node):
s = ""
for child_node in node.childNodes:
if child_node.nodeType == child_node.TEXT_NODE:
s += child_node.nodeValue
else:
s += collect_text(child_node)
return s
collect_text(block)
-> for child_node in node.childNodes:
-> AttributeError: 'NoneType' object has no attribute 'childNodes'
© Stack Overflow or respective owner