Problem with XML encoding of database contents with Latin characters

Posted by user89691 on Stack Overflow See other posts from Stack Overflow or by user89691
Published on 2010-05-28T04:28:19Z Indexed on 2010/05/28 4:31 UTC
Read the original article Hit count: 454

Filed under:
|
|
|
|

I have an ASP Access database that contains strings in various European languages. The database was populated prior by agents in the respective countries. It contains entries with accented etc characters as you would expect. If I open the database with MS Access these characters show up fine. For example the the German equivalent of "Open" shows as "Öffnen" (hopefully you can see an "O" with 2 dots above it!).

I have ASP code that reads the database and returns records in XML. The text is passed to XMLEncode to construct the XML, but that only seems to deal with the 5 specials like "<", "&", etc. If I dump the XML the accented characters are unchanged.

<English>Open</English>
<German>Öffnen</German> 

If I look at the raw packets with Wireshark I see that the "Ö" byte is hex D6, which appears to be it's decimal Unicode and ISO 8859-1 value.

The problem starts when I try to parse the XML in client-side JS. I get:

"An invalid character was found in text content"

from IE. FF and Chrome happily accept the XML without hiccup but the browser shows the "Ö" character as a diamond with a question mark inside.

http://www.validome.org/xml/validate/ reports "encoding error."

http://www.w3schools.com/dom/dom_validate.asp thinks it is fine.

The XML is UTF-8 encoded.

What do I need to do to have IE accept my XML without complaint?

What do I need to do to have browsers display the stuff correctly?

© Stack Overflow or respective owner

Related posts about Xml

Related posts about AJAX