What encoding does c32rtomb convert to?
- by R. Martinho Fernandes
The functions c32rtomb and mbrtoc32 from <cuchar>/<uchar.h> are described in the C Unicode TR (draft) as performing conversions between UTF-321 and "multibyte characters".
(...) If s is not a null
pointer, the c32rtomb function determines the number of bytes needed to represent
the multibyte character that corresponds to the wide character given by c32
(including any shift sequences), and stores the multibyte character representation in
the array whose first element is pointed to by s. (...)
What is this "multibyte character representation"? I'm actually interested in the behaviour of the following program:
#include <cassert>
#include <cuchar>
#include <string>
int main() {
std::u32string u32 = U"this is a wide string";
std::string narrow = "this is a wide string";
std::string converted(1000, '\0');
char* ptr = &converted[0];
std::mbstate_t state {};
for(auto u : u32) {
ptr += std::c32rtomb(ptr, u, &state);
}
converted.resize(ptr - &converted[0]);
assert(converted == narrow);
}
Is the assertion in it guaranteed to hold1?
1 Working under the assumption that __STDC_UTF_32__ is defined.