I have a problem with python about reading and print utf8 text file.
I have a test.txt in utf8 encoding without BOM. This file has two characters in it:
??
The first character "?" is Chinese and the second "?" is Japanese. Now, When I use Ulipad (a python editor) to run the following code to read the txt file, and print these two characters.
import codecs
infile = "C:\\test.txt"
f = codecs.open(infile, "r", "utf-8")
s = f.read()
print(s)
I got this error,
"UnicodeEncodeError: 'cp950' codec can't encode character '\u58f0' in position 1:
illegal multibyte sequence"
I found it caused from the second character "?" .
But when I use the same code to test in python default GUI IDLE, it works to print the two characters with no error. So, how can I fix the problem.
My running environment is python 3.1 , windows xp traditional Chinese.