How do I read UTF-8 characters via a pointer?
- by Jen
Suppose I have UTF-8 content stored in memory, how do I read the characters using a pointer? I presume I need to watch for the 8th bit indicating a multi-byte character, but how exactly do I turn the sequence into a valid Unicode character? Also, is wchar_t the proper type to store a single Unicode character?
This is what I have in mind:
wchar_t readNextChar (char** p)
{
char ch = *p++;
if (ch & 128)
{
// This is a multi-byte character, what do I do now?
// char chNext = *p++;
// ... but how do I assemble the Unicode character?
...
}
...
}