PHP - DOM class - numbered entities and encodings problem

Posted by user343607 on Stack Overflow See other posts from Stack Overflow or by user343607
Published on 2010-05-18T20:13:07Z Indexed on 2010/05/18 20:40 UTC
Read the original article Hit count: 286

Filed under:
|
|
|
|

Hi guys,

I'm having some difficult with PHP DOM class.

I am making a sitemap script, and I need the output of $doc->saveXML() to be like

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <url>
        <loc>http://www.somesite.com/servi&#xE7;os/redesign</loc>
    </url>
</root>

or

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <url>
        <loc>http://www.somesite.com/servi&#231;os/redesign</loc>
    </url>
</root>

but I am getting:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <url>
        <loc>http://www.somesite.com/servi&amp;#xE7;os/redesign</loc>
    </url>
</root>

This is the closet I could get, using a replace named to numbered entities function.

I was also able to reproduce

<?xml version="1.0" ?>
<root>
    <url>
        <loc>http://www.somesite.com/servi&amp;#xE7;os/redesign</loc>
    </url>
</root>

But without the encoding specified.

The best solution (the way I think the code should be written) would be:

<?php
$myArray = array();
// do some stuff to populate the with URL strings

$doc = new DOMDocument('1.0', 'UTF-8');

// here we modify some property. Maybe is the answer I am looking for...

$urlset = doc->createElement("urlset");
$urlset = $doc->appendChild($urlset);

foreach($myArray as $address) {
    $url = $doc->createElement("url");
    $url = $urlset->appendChild($url);

    $loc = $doc->createElement("loc");
    $loc = $url->appendChild($loc);

    $valueContent = $doc->createTextNode($value);
    $valueContent = $loc->appendChild($address);
}

echo $doc->saveXML();
?>

Notes:

  • Server response header contains charset as UTF-8;
  • PHP script is saved in UTF-8;
  • URLs read are UTF-8 strings;
  • Above script contains encoding declaration on DOMDocument constructor, and does not use any convert functions, like htmlentities, urlencode, utf8_encode...

I've tried changing the DOMDocument properties DOMDocument::$resolveExternals and DOMDocument::$substituteEntities values. None combinations worked.

And yes, I know I can made all process without specifying the character set on DOMDocument constructor, dump string content into a variable and make a very simple string substitution with string replace functions. This works. But I would like to know where I am slipping, how can this be made using native API's and settings, or even if this is possible.

Thanks in advance.

© Stack Overflow or respective owner

Related posts about php

Related posts about dom