Dealing with wacky encodings in Python
- by Tyson
I have a Python script that pulls in data from many sources (databases, files, etc.). Supposedly, all the strings are unicode, but what I end up getting is any variation on the following theme (as returned by repr()):
u'D\\xc3\\xa9cor'
u'D\xc3\xa9cor'
'D\\xc3\\xa9cor'
'D\xc3\xa9cor'
Is there a reliable way to take any four of the above strings and return the proper unicode string?
u'D\xe9cor' # --> Décor
The only way I can think of right now uses eval(), replace(), and a deep, burning shame that will never wash away.