Intra-Unicode "lean" Encoding Converters

Posted by Mystagogue on Stack Overflow See other posts from Stack Overflow or by Mystagogue
Published on 2010-06-08T23:26:44Z Indexed on 2010/06/08 23:32 UTC
Read the original article Hit count: 453

Filed under:

c++

|

unicode

|

conversion

Windows provides encoding conversion functions ("MultiByteToWideChar" and "WideCharToMultiByte") which are capable of UTF-8 to/from UTF-16 conversions, among other things. But I've seen people offer home-grown 30 to 40 line functions that claim also to perform UTF-8 / UTF-16 encoding conversions.

My question is, how reliable are such tiny converters? Can such a tiny amount of code handle problems such as converting a UTF-16 surrogate pair (such as ) into a UTF-8 single four byte sequence (rather than making the mistake of converting into a pair of three byte sequences)? Can they correctly spot "unpaired" surrogate input, and provide an error?

In short, are such tiny converters mere toys, or can they be taken seriously? For that matter, why does unicode.org seemingly offer no advice on an algorithm for accomplishing such conversions?

© Stack Overflow or respective owner

Related posts about c++

C++ : C++ Primer (Stanley Lipmann) or The C++ programming language (special edition)

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a Computer Science degree (long2 time ago) .. I do know Java OOP but i am now trying to pick up C++. I do have C and of course data structure using C or pascal. I have started reading Bjarne Stroustrup book (The C++ Programming Language - Special Edition) but find it extremely difficult esp… >>> More
Which C++ book shold I get between "C++ Primer" vs "C++ Primer Plus"

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to learn C++ by using Vim and MinGW as compiler. I'm interesting at "C++ Primer (4th Edition)" and "C++ Primer Plus (5th Edition)" but I don't know how about it different. It has no book store that I can review those books, so I want to know, what is the different between those book and which… >>> More
Managed c++ std::string not accessible in unmanaged c++

as seen on Stack Overflow - Search for 'Stack Overflow'
In unmanaged c++ dll i have a function which takes constant std::string as argument Prototype : void read ( const std::string &imageSpec_ ) I call this function from managed c++ dll by passing a std::string. When i debug the unmanaged c++ code the parameter imageSpec_ shows the value correctly… >>> More
I need help on my C++ assignment using MS Visual C++

as seen on Stack Overflow - Search for 'Stack Overflow'
Ok, so I don't want you to do my homework for me, but I'm a little lost with this final assignment and need all the help I can get. Learning about programming is tough enough, but doing it online is next to impossible for me... Now, to get to the program, I am going to paste what I have so far. This… >>> More
The Definitive C++ Book Guide and List

as seen on Stack Overflow - Search for 'Stack Overflow'
After more than a few questions about deciding on C++ books I thought we could make a better community wiki version. Providing QUALITY books and an approximate skill level. Maybe we can add a short blurb/description about each book that you have personally read / benefited from. Feel free to debate… >>> More

Related posts about unicode

Translating Between Unicode and Non-Unicode Character Sets in Java

as seen on Internet.com - Search for 'Internet.com'
You can use Java APIs not only to help translate characters, strings, and text streams to other languages, but also to convert Unicode character sets to non-Unicode and vice versa. >>> More
SQLite, python, unicode, and non-utf data

as seen on Stack Overflow - Search for 'Stack Overflow'
I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just… >>> More
SQLite, python, unicode, and non-utf data

as seen on Stack Overflow - Search for 'Stack Overflow'
I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just… >>> More
notepad sql Unicode and Non Unicode

as seen on Super User - Search for 'Super User'
Hi, I have a Microsoft Notepad flate file with data and Vertical Bar as column delimiter. I get following message: cannot convert between unicode and non-unicode string data types It seems it is my nvarchar(max) that creates my problem. I changed to varchar(max); but still the same problem. How… >>> More
On Windows 7, dir or tree can't show unicode characters, even starting cmd with cmd /U

as seen on Super User - Search for 'Super User'
On Windows 7, dir or tree can't show unicode characters, even starting cmd with cmd /U So I would press Window Key + R to run something, and type in cmd /U so that the content might handle Unicode. And then using dir or tree /F, the content in Unicode won't show as Unicode. (in Window Explorer… >>> More