How to detect the character encoding of a text file?
- by Cédric Boivin
I try to detect which character encoding is used in my file.
I try with this code to get the standard encoding
public static Encoding GetFileEncoding(string srcFile)
{
// *** Use Default of Encoding.Default (Ansi CodePage)
Encoding enc = Encoding.Default;
// *** Detect byte order mark if any - otherwise assume default
byte[] buffer = new byte[5];
FileStream file = new FileStream(srcFile, FileMode.Open);
file.Read(buffer, 0, 5);
file.Close();
if (buffer[0] == 0xef && buffer[1] == 0xbb && buffer[2] == 0xbf)
enc = Encoding.UTF8;
else if (buffer[0] == 0xfe && buffer[1] == 0xff)
enc = Encoding.Unicode;
else if (buffer[0] == 0 && buffer[1] == 0 && buffer[2] == 0xfe && buffer[3] == 0xff)
enc = Encoding.UTF32;
else if (buffer[0] == 0x2b && buffer[1] == 0x2f && buffer[2] == 0x76)
enc = Encoding.UTF7;
else if (buffer[0] == 0xFE && buffer[1] == 0xFF)
// 1201 unicodeFFFE Unicode (Big-Endian)
enc = Encoding.GetEncoding(1201);
else if (buffer[0] == 0xFF && buffer[1] == 0xFE)
// 1200 utf-16 Unicode
enc = Encoding.GetEncoding(1200);
return enc;
}
My five first byte are 60, 118, 56, 46 and 49.
Is there a chart that shows which encoding matches those five first bytes?