What encoding does c32rtomb convert to?

Posted by R. Martinho Fernandes on Stack Overflow See other posts from Stack Overflow or by R. Martinho Fernandes
Published on 2012-10-24T08:44:09Z Indexed on 2012/10/27 5:03 UTC
Read the original article Hit count: 122

Filed under:
|
|
|

The functions c32rtomb and mbrtoc32 from <cuchar>/<uchar.h> are described in the C Unicode TR (draft) as performing conversions between UTF-321 and "multibyte characters".

(...) If s is not a null pointer, the c32rtomb function determines the number of bytes needed to represent the multibyte character that corresponds to the wide character given by c32 (including any shift sequences), and stores the multibyte character representation in the array whose first element is pointed to by s. (...)

What is this "multibyte character representation"? I'm actually interested in the behaviour of the following program:

#include <cassert>
#include <cuchar>
#include <string>

int main() {
    std::u32string u32 = U"this is a wide string";
    std::string narrow  = "this is a wide string";
    std::string converted(1000, '\0');
    char* ptr = &converted[0];
    std::mbstate_t state {};
    for(auto u : u32) {
        ptr += std::c32rtomb(ptr, u, &state);
    }
    converted.resize(ptr - &converted[0]);
    assert(converted == narrow);
}

Is the assertion in it guaranteed to hold1?


1 Working under the assumption that __STDC_UTF_32__ is defined.

© Stack Overflow or respective owner

Related posts about c++

Related posts about unicode