Detect remote charset in php
Posted
by yallaa
on Stack Overflow
See other posts from Stack Overflow
or by yallaa
Published on 2010-04-15T10:53:11Z
Indexed on
2010/04/15
11:03 UTC
Read the original article
Hit count: 288
Hello,
I would like to determine a remote page's encoding through detection of the Content-Type header tag
<meta http-equiv="Content-Type" content="text/html; charset=XXXXX" />
if present.
I retrieve the remote page and try to do a regex to find the required setting if present. I am still learning hence the problem below... Here is what I have:
$EncStart = 'charset=';
$EncEnd = '" \/\>';
preg_match( "/$EncStart(.*)$EncEnd/s", $RemoteContent, $RemoteEncoding );
echo = $RemoteEncoding[ 1 ];
The above does indeed echo the name of the encoding but it does not know where to stop so it prints out the rest of the line then most of the rest of the remote page in my test. Example: When testing a remote russian page it printed:
windows-1251" />
rest of page ....
Which means that $EncStart
was okay, but the $EncEnd
part of the regex failed to stop the matching. This meta header usually ends in 3 different possibility after the name of the encoding.
"> | "/> | " />
I do not know weather this is usable to satisfy the end of the maching and if yes how to escape it. I played with different ways of doing it but none worked.
Thank you in advance for lending a hand.
© Stack Overflow or respective owner