Dealing with wacky encodings in Python
Posted
by Tyson
on Stack Overflow
See other posts from Stack Overflow
or by Tyson
Published on 2010-06-07T05:42:59Z
Indexed on
2010/06/07
6:22 UTC
Read the original article
Hit count: 282
I have a Python script that pulls in data from many sources (databases, files, etc.). Supposedly, all the strings are unicode, but what I end up getting is any variation on the following theme (as returned by repr()
):
u'D\\xc3\\xa9cor'
u'D\xc3\xa9cor'
'D\\xc3\\xa9cor'
'D\xc3\xa9cor'
Is there a reliable way to take any four of the above strings and return the proper unicode string?
u'D\xe9cor' # --> Décor
The only way I can think of right now uses eval()
, replace()
, and a deep, burning shame that will never wash away.
© Stack Overflow or respective owner