Special characters from MySQL database (e.g. curly apostrophes) are mangling my XML

Posted by Toph on Stack Overflow See other posts from Stack Overflow or by Toph
Published on 2010-05-18T01:57:39Z Indexed on 2010/05/18 2:00 UTC
Read the original article Hit count: 319

Filed under:
|
|
|
|

I have a MySQL database of newspaper articles. There's a volume table, an issue table, and an article table. I have a PHP file that generates a property list that is then pulled in and read by an iPhone app. The plist holds each article as a dictionary inside each issue, and each issue as a dictionary inside each volume. The plist doesn't actually hold the whole article -- just a title and URL.

Some article titles contain special characters, like curly apostrophes. Looking at the generated XML plist, whenever it hits a special character, it unpredictably gobbles up a whole bunch of text, leaving the XML mangled and unreadable.

(...in Chrome, anyway, and I'm guessing on the iPhone. Firefox actually handles it pretty well, showing a white ? in a black diamond in place of any special characters and not gobbling anything.)

Example well-formed plist snippet:

<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> 
<plist version="1.0"> 
<dict> 
    <key>Rows</key> 
    <array>     
        <dict> 
            <key>Title</key> 
            <string>Vol. 133 (2003-2004)</string> 
            <key>Children</key> 
            <array>         
                <dict> 
                    <key>Title</key> 
                    <string>No. 18 (Apr 2, 2004)</string> 
                    <key>Children</key> 
                    <array>                 
                        <dict> 
                            <key>Title</key> 
                            <string>Basketball concludes historic season</string> 
                            <key>URL</key> 
                            <string>http://orient.bowdoin.edu/orient/article_iphone.php?date=2004-04-02&amp;section=1&amp;id=1</string> 
                        </dict>

                        <!-- ... -->

                    </array>
                </dict>     
            </array>
        </dict>
    </array>
</dict>
</plist>

Example of what happens when it hits a curly apostrophe: This is from Chrome. This time it ate 5,998 characters, by MS Word's count, skipping down to midway through the opening the title of a pizza story; if I reload it'll behave differently, eating some other amount. The proper title is: Singer-songwriter Farrell ’05 finds success beyond the bubble

                    <dict> 
                        <key>Title</key> 
                        <string>Singer-songwriter Farrell ing>Students embrace free pizza, College objects to solicitation</string> 
                        <key>URL</key> 
                        <string>http://orient.bowdoin.edu/orient/article_iphone.php?date=2009-09-18&amp;section=1&amp;id=9</string> 
                    </dict> 

In MySQL that title is stored as (in binary):

53 69 6E 67 |65 72 2D 73 |6F 6E 67 77 |72 69 74 65
72 20 46 61 |72 72 65 6C |6C 20 C2 92 |30 35 20 66
69 6E 64 73 |20 73 75 63 |63 65 73 73 |20 62 65 79
6F 6E 64 20 |74 68 65 20 |62 75 62 62 |6C

Any ideas how I can encode/decode things properly? If not, any idea how I can get around the problem some other way?

I don't have a clue what I'm talking about, haha; let me know if there's any way I can help you help me. :) And many thanks!

© Stack Overflow or respective owner

Related posts about php

Related posts about mysql