How to strip out 0x0a special char from utf8 file using c# and keep file as utf8?
Posted
by
user1013388
on Stack Overflow
See other posts from Stack Overflow
or by user1013388
Published on 2013-08-02T15:23:39Z
Indexed on
2013/08/02
15:36 UTC
Read the original article
Hit count: 182
The following is a line from a UTF-8 file from which I am trying to remove the special char (0X0A), which shows up as a black diamond with a question mark below:
2464577 ????? True s6620178 Unspecified <1>?1009-672
This is generated when SSIS reads a SQL table then writes out, using a flat file mgr set to code page 65001.
When I open the file up in Notepad++, displays as 0X0A.
I'm looking for some C# code to definitely strip that char out and replace it with either nothing or a blank space.
Here's what I have tried:
string fileLocation = "c:\\MyFile.txt";
var content = string.Empty;
using (StreamReader reader = new System.IO.StreamReader(fileLocation))
{
content = reader.ReadToEnd();
reader.Close();
}
content = content.Replace('\u00A0', ' ');
//also tried: content.Replace((char)0X0A, ' ');
//also tried: content.Replace((char)0X0A, '');
//also tried: content.Replace((char)0X0A, (char)'\0');
Encoding encoding = Encoding.UTF8;
using (FileStream stream = new FileStream(fileLocation, FileMode.Create))
{
using (BinaryWriter writer = new BinaryWriter(stream, encoding))
{
writer.Write(encoding.GetPreamble()); //This is for writing the BOM
writer.Write(content);
}
}
I also tried this code to get the actual string value:
byte[] bytes = { 0x0A };
string text = Encoding.UTF8.GetString(bytes);
And it comes back as "\n". So in the code above I also tried replacing "\n" with " ", both in double quotes and single quotes, but still no change.
At this point I'm out of ideas. Anyone got any advice?
Thanks.
© Stack Overflow or respective owner