What is Wordpress doing for content encoding in it's mysql database?

Posted by qbxk on Stack Overflow See other posts from Stack Overflow or by qbxk
Published on 2010-03-28T22:05:17Z Indexed on 2010/03/28 22:23 UTC
Read the original article Hit count: 208

Filed under:
|
|
|
|

For some convoluted reasons best left behind us, I require direct access the contents of a wordpress database. I'm using mysql 5.0.70-r1 on gentoo with wordpress 2.6, and perl 5.8.8 ftr.

So, sometimes we get high-order characters in the blog, we have quite a few authors contributing too, for the most part these characters end up in wp's database in wp_posts.post_content or wp_postmeta.meta_value, Wordpress is displaying these correctly on it's site, but the database stores it using single byte encoding that I can't figure out how to convert to the correct string. Today's example:

the blog shows this, and doesn't even seem to escape any chars in the html,

   Hãhãhães  

but the database, when viewed via the mysql prompt, has,

   Hãhãhães

So clearly this is some kind of double-byte encoding issue, but I don't know how I can correct it. I need to be able to pull that second string from the database (b/c that's what it gives me) and convert it to the first one, and i need to do so using perl.

also, just to help unmuddy any waters, I took these strings and printed out the ascii codes for each character using perl's ord() function.

Here is the output of the "wrong" string

H = 72
à = 195
£ = 163
h = 104
à = 195
£ = 163
h = 104
à = 195
£ = 163
e = 101
s = 115

This is the correct string, that I need to produce in my script

H = 72
ã = 227
h = 104
ã = 227
h = 104
ã = 227
e = 101
s = 115

© Stack Overflow or respective owner

Related posts about Wordpress

Related posts about mysql