PHP - DOM class - numbered entities and encodings problem
Posted
by user343607
on Stack Overflow
See other posts from Stack Overflow
or by user343607
Published on 2010-05-18T20:13:07Z
Indexed on
2010/05/18
20:40 UTC
Read the original article
Hit count: 290
Hi guys,
I'm having some difficult with PHP DOM class.
I am making a sitemap script, and I need the output of $doc->saveXML() to be like
<?xml version="1.0" encoding="UTF-8"?>
<root>
<url>
<loc>http://www.somesite.com/serviços/redesign</loc>
</url>
</root>
or
<?xml version="1.0" encoding="UTF-8"?>
<root>
<url>
<loc>http://www.somesite.com/serviços/redesign</loc>
</url>
</root>
but I am getting:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<url>
<loc>http://www.somesite.com/servi&#xE7;os/redesign</loc>
</url>
</root>
This is the closet I could get, using a replace named to numbered entities function.
I was also able to reproduce
<?xml version="1.0" ?>
<root>
<url>
<loc>http://www.somesite.com/servi&#xE7;os/redesign</loc>
</url>
</root>
But without the encoding specified.
The best solution (the way I think the code should be written) would be:
<?php
$myArray = array();
// do some stuff to populate the with URL strings
$doc = new DOMDocument('1.0', 'UTF-8');
// here we modify some property. Maybe is the answer I am looking for...
$urlset = doc->createElement("urlset");
$urlset = $doc->appendChild($urlset);
foreach($myArray as $address) {
$url = $doc->createElement("url");
$url = $urlset->appendChild($url);
$loc = $doc->createElement("loc");
$loc = $url->appendChild($loc);
$valueContent = $doc->createTextNode($value);
$valueContent = $loc->appendChild($address);
}
echo $doc->saveXML();
?>
Notes:
- Server response header contains charset as UTF-8;
- PHP script is saved in UTF-8;
- URLs read are UTF-8 strings;
- Above script contains encoding declaration on DOMDocument constructor, and does not use any convert functions, like htmlentities, urlencode, utf8_encode...
I've tried changing the DOMDocument properties DOMDocument::$resolveExternals and DOMDocument::$substituteEntities values. None combinations worked.
And yes, I know I can made all process without specifying the character set on DOMDocument constructor, dump string content into a variable and make a very simple string substitution with string replace functions. This works. But I would like to know where I am slipping, how can this be made using native API's and settings, or even if this is possible.
Thanks in advance.
© Stack Overflow or respective owner