How do I ignore the UTF-8 Byte Order Marker in String comparisons?
Posted
by Skrud
on Stack Overflow
See other posts from Stack Overflow
or by Skrud
Published on 2010-05-26T17:07:30Z
Indexed on
2010/05/26
17:11 UTC
Read the original article
Hit count: 263
I'm having a problem comparing strings in a Unit Test in C# 4.0 using Visual Studio 2010. This same test case works properly in Visual Studio 2008 (with C# 3.5).
Here's the relevant code snippet:
byte[] rawData = GetData();
string data = Encoding.UTF8.GetString(rawData);
Assert.AreEqual("Constant", data, false, CultureInfo.InvariantCulture);
While debugging this test, the data
string appears to the naked eye to contain exactly the same string as the literal. When I called data.ToCharArray()
, I noticed that the first byte of the string data
is the value 65279
which is the UTF-8 Byte Order Marker. What I don't understand is why Encoding.UTF8.GetString()
keeps this byte around.
How do I get Encoding.UTF8.GetString()
to not put the Byte Order Marker in the resulting string?
© Stack Overflow or respective owner