Parsing HTML with Python 2.7 - HTMLParser, SGMLParser, or Beautiful Soup?

Posted by Eric Wilson on Stack Overflow See other posts from Stack Overflow or by Eric Wilson
Published on 2011-06-27T14:11:55Z Indexed on 2013/11/11 3:54 UTC
Read the original article Hit count: 474

I want to do some screen-scraping with Python 2.7, and I have no context for the differences between HTMLParser, SGMLParser, or Beautiful Soup.

Are these all trying to solve the same problem, or do they exist for different reasons? Which is simplest, which is most robust, and which (if any) is the default choice?

Also, please let me know if I have overlooked a significant option.

Edit: I should mention that I'm not particularly experienced in HTML parsing, and I'm particularly interested in which will get me moving the quickest, with the goal of parsing HTML on one particular site.

© Stack Overflow or respective owner

Related posts about python

Related posts about html