Corrupt UTF-8 Characters with PHP 5.2.10 and MySQL 5.0.81
- by jkndrkn
We have an application hosted on both a local development server and a live site. We are experiencing UTF-8 corruption issues and are looking to figure out how to resolve them.
The system is run using symfony 1.0 with Propel.
On our development server, we are running PHP 5.2.0 and MySQL 5.0.32. We do not experience corrupted UTF-8 characters there.
On our live site, PHP 5.2.10 and MySQL 5.0.81 is running. On that server, certain characters such as ô´ and S are corrupted once they are stored in the database. The corrupted characters are showing up as either question marks or approximations of the original character with adjacent question marks.
Examples of corruption:
Uncorrupted: ô´
Corrupted: ô?
Uncorrupted: S
Corrupted: ?
We are currently using the following techniques on both development and live servers:
Executing the following queries prior to execution of any other queries:
SET NAMES 'utf8' COLLATE 'utf8_unicode_ci'
SET CHARSET 'utf8'
Setting the <meta> Content-Type value to:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Adding the following to our .htaccess file:
AddDefaultCharset utf-8
Using mb_* (multibyte) PHP functions where necessary.
Being sure to set database columns to use utf8_unicode_ci collation.
These techniques are sufficient for our development site, but do not work on the live site.
On the live site I've also tried adding mysql_set_encoding('ut8', $mysql_connection) but this does not help either. I have found some evidence that newer versions of PHP and MySQL are mishandling UTF-8 character encodings.