What is Wordpress doing for content encoding in it's mysql database?
- by qbxk
For some convoluted reasons best left behind us, I require direct access the contents of a wordpress database. I'm using mysql 5.0.70-r1 on gentoo with wordpress 2.6, and perl 5.8.8 ftr.
So, sometimes we get high-order characters in the blog, we have quite a few authors contributing too, for the most part these characters end up in wp's database in wp_posts.post_content or wp_postmeta.meta_value, Wordpress is displaying these correctly on it's site, but the database stores it using single byte encoding that I can't figure out how to convert to the correct string. Today's example:
the blog shows this, and doesn't even seem to escape any chars in the html,
Hãhãhães
but the database, when viewed via the mysql prompt, has,
Hãhãhães
So clearly this is some kind of double-byte encoding issue, but I don't know how I can correct it. I need to be able to pull that second string from the database (b/c that's what it gives me) and convert it to the first one, and i need to do so using perl.
also, just to help unmuddy any waters, I took these strings and printed out the ascii codes for each character using perl's ord() function.
Here is the output of the "wrong" string
H = 72
à = 195
£ = 163
h = 104
à = 195
£ = 163
h = 104
à = 195
£ = 163
e = 101
s = 115
This is the correct string, that I need to produce in my script
H = 72
ã = 227
h = 104
ã = 227
h = 104
ã = 227
e = 101
s = 115