Python UTF-16 encoding hex representation
Posted
by
Romeno
on Stack Overflow
See other posts from Stack Overflow
or by Romeno
Published on 2012-06-25T21:12:08Z
Indexed on
2012/06/25
21:15 UTC
Read the original article
Hit count: 330
I have a string in Python 2.7.2 say u"\u0638". When I write it to file:
f = open("J:\\111.txt", "w+")
f.write(u"\u0638".encode('utf-16'))
f.close()
In hex it looks like: FF FE 38 06 When i print such a string to stdout i will see: '\xff\xfe8\x06'.
The querstion: Where is \x38 in the string output to stdout? In other words why the string output to stdout is not '\xff\xfe\x38\x06'?
If I write the string to file twice:
f = open("J:\\111.txt", "w+")
f.write(u"\u0638".encode('utf-16'))
f.write(u"\u0638".encode('utf-16'))
f.close()
The hex representation in file contains byte order mark (BOM) \xff\xfe twice: FF FE 38 06 FF FE 38 06
I wonder what is the techique to avoid writting BOM in UTF-16 encoded strings?
© Stack Overflow or respective owner