Converting HTML special characters into their value using Python
- by tipu
I have a file that's littered with these:
http://www.utexas.edu/learn/html/spchar.html
That link just displays all sorts of HTML entities, such as
– –
— —
¡ ¡
and so on. Is it possible in Python to natively convert these characters back into their values so any occurrences of – will appear as – instead? My current approach was just to make a dict of key html entities and their utf-8 values and do search and replace, but I was wondering if there are any libraries that can take care of this for me.