Why does Python sometimes upgrade a string to unicode and sometimes not?

Posted by samtregar on Stack Overflow See other posts from Stack Overflow or by samtregar
Published on 2010-05-19T17:10:06Z Indexed on 2010/05/19 17:30 UTC
Read the original article Hit count: 285

Filed under:

python

|

unicode

I'm confused. Consider this code working the way I expect:

>>> foo = u'Émilie and Juañ are turncoats.'
>>> bar = "foo is %s" % foo
>>> bar
u'foo is \xc3\x89milie and Jua\xc3\xb1 are turncoats.'

And this code not at all working the way I expect:

>>> try:
...     raise Exception(foo)
... except Exception as e:
...     foo2 = e
... 
>>> bar = "foo2 is %s" % foo2
------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

Can someone explain what's going on here? Why does it matter whether the unicode data is in a plain unicode string or stored in an Exception object? And why does this fix it:

>>> bar = u"foo2 is %s" % foo2
>>> bar
u'foo2 is \xc3\x89milie and Jua\xc3\xb1 are turncoats.'

I am quite confused! Thanks for the help!

UPDATE: My coding buddy Randall has added to my confusion in an attempt to help me! Send in the reinforcements to explain how this is supposed to make sense:

>>> class A:
...     def __str__(self): return "string"
...     def __unicode__(self): return "unicode"
... 
>>> "%s %s" % (u'niño', A())
u'ni\xc3\xb1o unicode'
>>> "%s %s" % (A(), u'niño')
u'string ni\xc3\xb1o'

Note that the order of the arguments here determines which method is called!

Developer IT