Strange behaviour of mb_detect_order() in PHP
- by termopro
I would like to detect encoding of some text (using PHP).
For that purpose i use mb_detect_encoding() function.
The problem is that the function returns different results if i change the order of possible encodings with mb_detect_order() function.
Consider the following example
$html = <<< STR
?????????????????????????????????????????????????????????????????????????????????????????????????????????
STR;
mb_detect_order(array('UTF-8','EUC-JP', 'SJIS', 'eucJP-win', 'SJIS-win', 'JIS', 'ISO-2022-JP','ISO-8859-1','ISO-8859-2'));
$originalEncoding = mb_detect_encoding($str);
die($originalEncoding); // $originalEncoding = 'UTF-8'
However if you change the order of encodings in mb_detect_order() the results will be different:
mb_detect_order(array('EUC-JP','UTF-8', 'SJIS', 'eucJP-win', 'SJIS-win', 'JIS', 'ISO-2022-JP','ISO-8859-1','ISO-8859-2'));
die($originalEncoding); // $originalEncoding = 'EUC-JP'
So my questions are:
Why is that happening ?
Is there a way in PHP to correctly and unambiguously detect encoding of text ?