HTML parser for GAE
Posted
by Richard
on Stack Overflow
See other posts from Stack Overflow
or by Richard
Published on 2010-01-29T11:29:20Z
Indexed on
2010/04/26
21:23 UTC
Read the original article
Hit count: 319
Generally I use lxml for my HTML parsing needs, but that isn't available on Google App Engine. The obvious alternative is BeautifulSoup, but I find it chokes too easily on malformed HTML. Currently I am testing libxml2dom and have been getting better results.
Which pure Python HTML parser have you found performs best? My priority is the ability to handle bad HTML over speed.
© Stack Overflow or respective owner