How do I read UTF-8 characters via a pointer?
Posted
by Jen
on Stack Overflow
See other posts from Stack Overflow
or by Jen
Published on 2010-06-01T08:41:28Z
Indexed on
2010/06/01
8:43 UTC
Read the original article
Hit count: 185
Suppose I have UTF-8 content stored in memory, how do I read the characters using a pointer? I presume I need to watch for the 8th bit indicating a multi-byte character, but how exactly do I turn the sequence into a valid Unicode character? Also, is wchar_t
the proper type to store a single Unicode character?
This is what I have in mind:
wchar_t readNextChar (char** p)
{
char ch = *p++;
if (ch & 128)
{
// This is a multi-byte character, what do I do now?
// char chNext = *p++;
// ... but how do I assemble the Unicode character?
...
}
...
}
© Stack Overflow or respective owner