How do I unescape HTML entities in a string in Python 3.1?

Posted by Sho Minamimoto on Stack Overflow See other posts from Stack Overflow or by Sho Minamimoto
Published on 2010-03-02T02:50:19Z Indexed on 2010/04/12 20:33 UTC
Read the original article Hit count: 675

Filed under:
|
|
|
|

I have looked all around and only found solutions for python 2.6 and earlier, NOTHING on how to do this in python 3.X. (I only have access to Win7 box.)

I HAVE to be able to do this in 3.1 and preferably without external libraries. Currently, I have httplib2 installed and access to command-prompt curl (that's how I'm getting the source code for pages). Unfortunately, curl does not decode html entities, as far as I know, I couldn't find a command to decode it in the documentation.

YES, I've tried to get Beautiful Soup to work, MANY TIMES without success in 3.X. If you could provide EXPLICIT instructions on how to get it to work in python 3 in MS Windows environment, I would be very grateful.

So, to be clear, I need to turn strings like this: Suzy & John into a string like this: "Suzy & John".

© Stack Overflow or respective owner

Related posts about html

Related posts about entities