Python and Unicode: How everything should be Unicode
Posted
by
A A
on Stack Overflow
See other posts from Stack Overflow
or by A A
Published on 2010-12-27T18:15:29Z
Indexed on
2010/12/27
18:53 UTC
Read the original article
Hit count: 308
Forgive if this a long a question:
I have been programming in Python for around six months. Self taught, starting with the Python tutorial and then SO and then just using Google for stuff.
Here is the sad part: No one told me all strings should be Unicode. No, I am not lying or making this up, but where does the tutorial mention it? And most examples also I see just make use of byte strings
, instead of Unicode strings.
I was just browsing and came across this question on SO, which says how every string in Python should be a Unicode string. This pretty much made me cry!
I read that every string in Python 3.0 is Unicode by default, so my questions are for 2.x:
Should I do a:
print u'Some text'
or justprint 'Text'
?Everything should be Unicode, does this mean, like say I have a
tuple
:t = ('First', 'Second'), it should be t = (u'First', u'Second')?
I read that I can do a
from __future__ import unicode_literals
and then every string will be a Unicode string, but should I do this inside a container also?When reading/ writing to a file, I should use the
codecs
module. Right? Or should I just use the standard way or reading/ writing andencode
ordecode
where required?If I get the string from say
raw_input()
, should I convert that to Unicode also?
What is the common approach to handling all of the above issues in 2.x? The from __future__ import unicode_literals
statement?
Sorry for being a such a noob, but this changes what I have been doing for a long time and so clearly I am confused.
© Stack Overflow or respective owner