HTML parser for GAE

Posted by Richard on Stack Overflow See other posts from Stack Overflow or by Richard
Published on 2010-01-29T11:29:20Z Indexed on 2010/04/26 21:23 UTC
Read the original article Hit count: 319

Generally I use lxml for my HTML parsing needs, but that isn't available on Google App Engine. The obvious alternative is BeautifulSoup, but I find it chokes too easily on malformed HTML. Currently I am testing libxml2dom and have been getting better results.

Which pure Python HTML parser have you found performs best? My priority is the ability to handle bad HTML over speed.

© Stack Overflow or respective owner

Related posts about html-parsing

Related posts about python