How to strip out 0x0a special char from utf8 file using c# and keep file as utf8?

Posted by user1013388 on Stack Overflow See other posts from Stack Overflow or by user1013388
Published on 2013-08-02T15:23:39Z Indexed on 2013/08/02 15:36 UTC
Read the original article Hit count: 182

Filed under:
|
|
|

The following is a line from a UTF-8 file from which I am trying to remove the special char (0X0A), which shows up as a black diamond with a question mark below:

2464577 ????? True s6620178 Unspecified <1>?1009-672

This is generated when SSIS reads a SQL table then writes out, using a flat file mgr set to code page 65001.

When I open the file up in Notepad++, displays as 0X0A.

I'm looking for some C# code to definitely strip that char out and replace it with either nothing or a blank space.

Here's what I have tried:

        string fileLocation = "c:\\MyFile.txt";
        var content = string.Empty;
        using (StreamReader reader = new System.IO.StreamReader(fileLocation))
        {
            content = reader.ReadToEnd();
            reader.Close();
        }



        content = content.Replace('\u00A0', ' ');
        //also tried: content.Replace((char)0X0A, ' '); 
        //also tried: content.Replace((char)0X0A, ''); 
        //also tried: content.Replace((char)0X0A, (char)'\0'); 
        Encoding encoding = Encoding.UTF8;
        using (FileStream stream = new FileStream(fileLocation, FileMode.Create))
        {
          using (BinaryWriter writer = new BinaryWriter(stream, encoding))
          {
            writer.Write(encoding.GetPreamble()); //This is for writing the BOM
            writer.Write(content);
          }
        }

I also tried this code to get the actual string value:

byte[] bytes = { 0x0A };
string text = Encoding.UTF8.GetString(bytes);

And it comes back as "\n". So in the code above I also tried replacing "\n" with " ", both in double quotes and single quotes, but still no change.

At this point I'm out of ideas. Anyone got any advice?

Thanks.

© Stack Overflow or respective owner

Related posts about c#

Related posts about utf-8