problem using getline with a unicode file

Posted by hamishmcn on Stack Overflow See other posts from Stack Overflow or by hamishmcn
Published on 2010-04-27T22:55:14Z Indexed on 2010/04/27 23:53 UTC
Read the original article Hit count: 635

Filed under:
|
|
|

UPDATE: Thank you to @Potatoswatter and @Jonathan Leffler for comments - rather embarrassingly I was caught out by the debugger tool tip not showing the value of a wstring correctly - however it still isn't quite working for me and I have updated the question below:

If I have a small multibyte file I want to read into a string I use the following trick - I use getline with a delimeter of '\0' e.g.

std::string contents_utf8;
std::ifstream inf1("utf8.txt");
getline(inf1, contents_utf8, '\0');

This reads in the entire file including newlines.
However if I try to do the same thing with a wide character file it doesn't work - my wstring only reads to the the first line.

std::wstring contents_wide;
std::wifstream inf2(L"ucs2-be.txt");
getline( inf2, contents_wide, wchar_t(0) ); //doesn't work

For example my if unicode file contains the chars A and B seperated by CRLF, the hex looks like this:

FE FF 00 41 00 0D 00 0A 00 42

Based on the fact that with a multibyte file getline with '\0' reads the entire file I believed that getline( inf2, contents_wide, wchar_t(0) ) should read in the entire unicode file. However it doesn't - with the example above my wide string would contain the following two wchar_ts: FF FF

(If I remove the wchar_t(0) it reads in the first line as expected (ie FE FF 00 41 00 0D 00)

Why doesn't wchar_t(0) work as a delimiting wchar_t of "00 00"?
Thank you

© Stack Overflow or respective owner

Related posts about c++

Related posts about unicode