problem using getline with a unicode file
- by hamishmcn
UPDATE: Thank you to @Potatoswatter and @Jonathan Leffler for comments - rather embarrassingly I was caught out by the debugger tool tip not showing the value of a wstring correctly - however it still isn't quite working for me and I have updated the question below:
If I have a small multibyte file I want to read into a string I use the following trick - I use getline with a delimeter of '\0' e.g.
std::string contents_utf8;
std::ifstream inf1("utf8.txt");
getline(inf1, contents_utf8, '\0');
This reads in the entire file including newlines.
However if I try to do the same thing with a wide character file it doesn't work - my wstring only reads to the the first line.
std::wstring contents_wide;
std::wifstream inf2(L"ucs2-be.txt");
getline( inf2, contents_wide, wchar_t(0) ); //doesn't work
For example my if unicode file contains the chars A and B seperated by CRLF, the hex looks like this:
FE FF 00 41 00 0D 00 0A 00 42
Based on the fact that with a multibyte file getline with '\0' reads the entire file I believed that getline( inf2, contents_wide, wchar_t(0) ) should read in the entire unicode file. However it doesn't - with the example above my wide string would contain the following two wchar_ts: FF FF
(If I remove the wchar_t(0) it reads in the first line as expected (ie FE FF 00 41 00 0D 00)
Why doesn't wchar_t(0) work as a delimiting wchar_t of "00 00"?
Thank you