PHP: Detect encoding and make everything UTF-8

Posted by marco92w on Stack Overflow See other posts from Stack Overflow or by marco92w
Published on 2009-05-26T13:50:34Z Indexed on 2010/05/23 6:00 UTC
Read the original article Hit count: 636

Filed under:
|
|

Hello!

I'm reading out lots of texts from various RSS feeds and inserting them into my database.

Of course, there are several different character encodings used in the feeds, e.g. UTF-8 and ISO-8859-1.

Unfortunately, there are sometimes problems with the encodings of the texts. Example:

1) The "ß" in "Fußball" should look like this in my database: "Ÿ". If it is a "Ÿ", it is displayed correctly.

2) Sometimes, the "ß" in "Fußball" looks like this in my database: "ß". Then it is displayed wrongly, of course.

3) In other cases, the "ß" is saved as a "ß" - so without any change. Then it is also displayed wrongly.

What can I do to avoid the cases 2 and 3?

How can I make everything the same encoding, preferably UTF-8? When must I use utf8_encode(), when must I use utf8_decode() (it's clear what the effect is but when must I use the functions?) and when must I do nothing with the input?

Can you help me and tell me how to make everything the same encoding? Perhaps with the function mb-detect-encoding()? Can I write a function for this? So my problems are: 1) How to find out what encoding the text uses 2) How to convert it to UTF-8 - whatever the old encoding is

Thanks in advance!

EDIT: Would a function like this work?

function correct_encoding($text) {
    $current_encoding = mb_detect_encoding($text, 'auto');
    $text = iconv($current_encoding, 'UTF-8', $text);
    return $text;
}

I've tested it but it doesn't work. What's wrong with it?

© Stack Overflow or respective owner

Related posts about php

Related posts about encoding