A better way of converting Codepage-1251 in RTF to Unicode
Posted
by blue painted
on Stack Overflow
See other posts from Stack Overflow
or by blue painted
Published on 2010-03-15T16:05:13Z
Indexed on
2010/03/15
16:09 UTC
Read the original article
Hit count: 618
I am trying to parse RTF (via MSEDIT) in various languages, all in Delphi 2010, in order to produce HTML in unicode.
Taking Russian/Cyrillic as my starting point I find that the overall document codepage is 1252 (Western) but the Russian parts of the text are identified by the charset of the font (RUSSIAN_CHARSET 204).
So far I am:
1) Use AnsiString (or RawByteString) when parsing the RTF
2) Determine the CodePage by a lookup from the font charset (see http://msdn.microsoft.com/en-us/library/cc194829.aspx)
3) Translating using a lookup table in my code: (This table generated from http://msdn.microsoft.com/en-gb/goglobal/cc305144.aspx) - I'm going to need one table per supported codepage!
There MUST be a better way than this? Preferably something supplied by the OS and so less brittle than tables of constants.
© Stack Overflow or respective owner