Are UTF16 (as used by for example wide-winapi functions) characters always 2 byte long?
Posted
by
Cray
on Stack Overflow
See other posts from Stack Overflow
or by Cray
Published on 2011-01-10T23:03:29Z
Indexed on
2011/01/10
23:53 UTC
Read the original article
Hit count: 247
Please clarify for me, how does UTF16 work? I am a little confused, considering these points:
- There is a static type in C++, WCHAR, which is 2 bytes long. (always 2 bytes long obvisouly)
- Most of msdn and some other documentation seem to have the assumptions that the characters are always 2 bytes long. This can just be my imagination, I can't come up with any particular examples, but it just seems that way.
- There are no "extra wide" functions or characters types widely used in C++ or windows, so I would assume that UTF16 is all that is ever needed.
- To my uncertain knowledge, unicode has a lot more characters than 65535, so they obvisouly don't have enough space in 2 bytes.
- UTF16 seems to be a bigger version of UTF8, and UTF8 characters can be of different lengths.
So if a UTF16 character not always 2 bytes long, how long else could it be? 3 bytes? or only multiples of 2? And then for example if there is a winapi function that wants to know the size of a wide string in characters, and the string contains 2 characters which are each 4 bytes long, how is the size of that string in characters calculated?
Is it 2 chars long or 4 chars long? (since it is 8 bytes long, and each WCHAR is 2 bytes)
© Stack Overflow or respective owner