Converting HTML special characters into their value using Python

Posted by tipu on Stack Overflow See other posts from Stack Overflow or by tipu
Published on 2010-05-19T14:03:10Z Indexed on 2010/05/19 14:10 UTC
Read the original article Hit count: 327

Filed under:
|
|

I have a file that's littered with these:

http://www.utexas.edu/learn/html/spchar.html

That link just displays all sorts of HTML entities, such as

–   –  
—  —   
¡  ¡ 

and so on. Is it possible in Python to natively convert these characters back into their values so any occurrences of – will appear as instead? My current approach was just to make a dict of key html entities and their utf-8 values and do search and replace, but I was wondering if there are any libraries that can take care of this for me.

© Stack Overflow or respective owner

Related posts about python

Related posts about html-entities