latin1/unicode conversion problem with ajax request and special characters
- by mfn
Server is PHP5 and HTML charset is latin1 (iso-8859-1). With regular form POST requests, there's no problem with "special" characters like the em dash (–) for example. Although I don't know for sure, it works. Probably because there exists a representable character for the browser at char code 150 (which is what I see in PHP on the server for a literal em dash with ord).
Now our application also provides some kind of preview mechanism via ajax: the text is sent to the server and a complete HTML for a preview is sent back. However, the ordinary char code 150 em dash character when sent via ajax (tested with GET and POST) mutates into something more: %E2%80%93. I see this already in the apache log.
According to various sources I found, e.g. http://www.tachyonsoft.com/uc0020.htm , this is the UTF8 byte representation of em dash and my current knowledge is that JavaScript handles everything in Unicode.
However within my app, I need everything in latin1. Simply said: just like a regular POST request would have given me that em dash as char code 150, I would need that for the translated UTF8 representation too.
That's were I'm failing, because with PHP on the server when I try to decode it with either utf8_decode(...) or iconv('UTF-8', 'iso-8859-1', ...) but in both cases I get a regular ? representing this character (and iconv also throws me a notice: Detected an illegal character in input string ).
My goal is to find an automated solution, but maybe I'm trying to be überclever in this case?
I've found other people simply doing manual replacing with a predefined input/output set; but that would always give me the feeling I could loose characters.
The observant reader will note that I'm behind on understanding the full impact/complexity with things about Unicode and conversion of chars and I definitely prefer to understand the thing as a whole then a simply manual mapping.
thanks