Strange behaviour of mb_detect_order() in PHP

Posted by termopro on Stack Overflow See other posts from Stack Overflow or by termopro
Published on 2010-05-21T10:24:10Z Indexed on 2010/05/21 10:40 UTC
Read the original article Hit count: 152

Filed under:
|

I would like to detect encoding of some text (using PHP). For that purpose i use mb_detect_encoding() function.

The problem is that the function returns different results if i change the order of possible encodings with mb_detect_order() function.

Consider the following example

$html = <<< STR
?????????????????????????????????????????????????????????????????????????????????????????????????????????
STR;
mb_detect_order(array('UTF-8','EUC-JP', 'SJIS', 'eucJP-win', 'SJIS-win', 'JIS', 'ISO-2022-JP','ISO-8859-1','ISO-8859-2'));
$originalEncoding = mb_detect_encoding($str);
die($originalEncoding); // $originalEncoding = 'UTF-8'

However if you change the order of encodings in mb_detect_order() the results will be different:

mb_detect_order(array('EUC-JP','UTF-8', 'SJIS', 'eucJP-win', 'SJIS-win', 'JIS', 'ISO-2022-JP','ISO-8859-1','ISO-8859-2'));        
die($originalEncoding); // $originalEncoding = 'EUC-JP'



So my questions are:
Why is that happening ?
Is there a way in PHP to correctly and unambiguously detect encoding of text ?

© Stack Overflow or respective owner

Related posts about php

Related posts about encoding