What Character Encoding Is This?

Posted by Canoehead on Stack Overflow See other posts from Stack Overflow or by Canoehead
Published on 2010-04-23T17:53:03Z Indexed on 2010/04/23 18:03 UTC
Read the original article Hit count: 468

Filed under:
|

I need to clean up some file containing French text. Problem is that the files erroneously contain multiple encodings within the same file.

I think some sections are ISO8859-1 (Latin 1) but other parts have text encoded in single byte characters that look like 'extended' ASCII. In other words, it is UTF-7 encoding plus the following:

  • 0x82 for é (e acute)
  • 0x8a for è (e grave)
  • 0x88 for ê (e circumflex)
  • 0x85 for à (a grave)
  • 0x87 for ç (c cedilla)

What encoding is this?

© Stack Overflow or respective owner

Related posts about character-encoding

Related posts about utf-7