PHP: Detect encoding and make everything UTF-8
Posted
by marco92w
on Stack Overflow
See other posts from Stack Overflow
or by marco92w
Published on 2009-05-26T13:50:34Z
Indexed on
2010/05/23
6:00 UTC
Read the original article
Hit count: 636
Hello!
I'm reading out lots of texts from various RSS feeds and inserting them into my database.
Of course, there are several different character encodings used in the feeds, e.g. UTF-8 and ISO-8859-1.
Unfortunately, there are sometimes problems with the encodings of the texts. Example:
1) The "ß" in "Fußball" should look like this in my database: "Ÿ". If it is a "Ÿ", it is displayed correctly.
2) Sometimes, the "ß" in "Fußball" looks like this in my database: "ß". Then it is displayed wrongly, of course.
3) In other cases, the "ß" is saved as a "ß" - so without any change. Then it is also displayed wrongly.
What can I do to avoid the cases 2 and 3?
How can I make everything the same encoding, preferably UTF-8? When must I use utf8_encode(), when must I use utf8_decode() (it's clear what the effect is but when must I use the functions?) and when must I do nothing with the input?
Can you help me and tell me how to make everything the same encoding? Perhaps with the function mb-detect-encoding()? Can I write a function for this? So my problems are: 1) How to find out what encoding the text uses 2) How to convert it to UTF-8 - whatever the old encoding is
Thanks in advance!
EDIT: Would a function like this work?
function correct_encoding($text) {
$current_encoding = mb_detect_encoding($text, 'auto');
$text = iconv($current_encoding, 'UTF-8', $text);
return $text;
}
I've tested it but it doesn't work. What's wrong with it?
© Stack Overflow or respective owner