A better way of converting Codepage-1251 in RTF to Unicode

Posted by blue painted on Stack Overflow See other posts from Stack Overflow or by blue painted
Published on 2010-03-15T16:05:13Z Indexed on 2010/03/15 16:09 UTC
Read the original article Hit count: 694

Filed under:

delphi

|

unicode

|

rtf

I am trying to parse RTF (via MSEDIT) in various languages, all in Delphi 2010, in order to produce HTML in unicode.

Taking Russian/Cyrillic as my starting point I find that the overall document codepage is 1252 (Western) but the Russian parts of the text are identified by the charset of the font (RUSSIAN_CHARSET 204).

So far I am:

1) Use AnsiString (or RawByteString) when parsing the RTF

2) Determine the CodePage by a lookup from the font charset (see http://msdn.microsoft.com/en-us/library/cc194829.aspx)

3) Translating using a lookup table in my code: (This table generated from http://msdn.microsoft.com/en-gb/goglobal/cc305144.aspx) - I'm going to need one table per supported codepage!

There MUST be a better way than this? Preferably something supplied by the OS and so less brittle than tables of constants.

© Stack Overflow or respective owner

Related posts about delphi

TVirtualStringTree compatibility between Delphi 7 and Delphi 2010 - 'Parameter lists differ'

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I've made a form containing a TVirtualStringTree that works in Delphi 7 and Delphi 2010. I notice that as I move between the two platforms I get the message '...parameter list differ..' on the tree events and that the string type is changing bewteen TWideString (D7) and string (D2010). The only… >>> More
Convert Delphi 7 code to work with Delphi 2009

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a String that I needed access to the first character of, so I used stringname[1]. With the unicode support this no longer works. I get an error: [DCC Error] sndkey32.pas(420): E2010 Incompatible types: 'Char' and 'AnsiChar' Example code: //vkKeyScan from the windows unit var KeyString:… >>> More
Compile Delphi component package (bpl) for different Delphi versions

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello. The situation is the following. Typically I use RAD Studio 2010 for Delphi development. I have some components I would like to redistribute in binary form (*.bpl without source). But I would like people to be able to use them despite of their Delphi version. But, for example, dcu files can… >>> More
Book recommendation for moving from Delphi 6 to Delphi 2010

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I am a long time Delphi 6 developer. Looking for a book on Delphi 2010. Not only on the new features available in Delphi 2010, but more importantly, guideline on how different it is to develop applications in Delphi 2010, the architecture, standard and convention, etc. Thanks a lot. >>> More
Delphi Speech recognition delphi

as seen on Stack Overflow - Search for 'Stack Overflow'
I need create a programatic equivalent using delphi language... or could someone post a link on how to do grammars in peech recogniton using the delphi. sorry for my english... XML Grammar Sample(s): <GRAMMAR>  <RULE NAME="HelloWorld"… >>> More

Related posts about unicode

Translating Between Unicode and Non-Unicode Character Sets in Java

as seen on Internet.com - Search for 'Internet.com'
You can use Java APIs not only to help translate characters, strings, and text streams to other languages, but also to convert Unicode character sets to non-Unicode and vice versa. >>> More
SQLite, python, unicode, and non-utf data

as seen on Stack Overflow - Search for 'Stack Overflow'
I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just… >>> More
SQLite, python, unicode, and non-utf data

as seen on Stack Overflow - Search for 'Stack Overflow'
I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just… >>> More
notepad sql Unicode and Non Unicode

as seen on Super User - Search for 'Super User'
Hi, I have a Microsoft Notepad flate file with data and Vertical Bar as column delimiter. I get following message: cannot convert between unicode and non-unicode string data types It seems it is my nvarchar(max) that creates my problem. I changed to varchar(max); but still the same problem. How… >>> More
On Windows 7, dir or tree can't show unicode characters, even starting cmd with cmd /U

as seen on Super User - Search for 'Super User'
On Windows 7, dir or tree can't show unicode characters, even starting cmd with cmd /U So I would press Window Key + R to run something, and type in cmd /U so that the content might handle Unicode. And then using dir or tree /F, the content in Unicode won't show as Unicode. (in Window Explorer… >>> More