python read utf8 text file problem

Posted by cpps on Stack Overflow See other posts from Stack Overflow or by cpps
Published on 2010-05-24T11:59:29Z Indexed on 2010/05/24 12:11 UTC
Read the original article Hit count: 373

Filed under:
|

I have a problem with python about reading and print utf8 text file.

I have a test.txt in utf8 encoding without BOM. This file has two characters in it:

??

The first character "?" is Chinese and the second "?" is Japanese. Now, When I use Ulipad (a python editor) to run the following code to read the txt file, and print these two characters.

import codecs
infile = "C:\\test.txt"

f = codecs.open(infile, "r", "utf-8")
s = f.read()

print(s)

I got this error,

"UnicodeEncodeError: 'cp950' codec can't encode character '\u58f0' in position 1:
 illegal multibyte sequence"

I found it caused from the second character "?" .

But when I use the same code to test in python default GUI IDLE, it works to print the two characters with no error. So, how can I fix the problem.

My running environment is python 3.1 , windows xp traditional Chinese.

© Stack Overflow or respective owner

Related posts about python

Related posts about unicode