Intra-Unicode "lean" Encoding Converters
Posted
by Mystagogue
on Stack Overflow
See other posts from Stack Overflow
or by Mystagogue
Published on 2010-06-08T23:26:44Z
Indexed on
2010/06/08
23:32 UTC
Read the original article
Hit count: 338
Windows provides encoding conversion functions ("MultiByteToWideChar" and "WideCharToMultiByte") which are capable of UTF-8 to/from UTF-16 conversions, among other things. But I've seen people offer home-grown 30 to 40 line functions that claim also to perform UTF-8 / UTF-16 encoding conversions.
My question is, how reliable are such tiny converters? Can such a tiny amount of code handle problems such as converting a UTF-16 surrogate pair (such as ) into a UTF-8 single four byte sequence (rather than making the mistake of converting into a pair of three byte sequences)? Can they correctly spot "unpaired" surrogate input, and provide an error?
In short, are such tiny converters mere toys, or can they be taken seriously? For that matter, why does unicode.org seemingly offer no advice on an algorithm for accomplishing such conversions?
© Stack Overflow or respective owner