Getting XML Numbered Entities with PHP 5 DOM
Posted
by user343607
on Stack Overflow
See other posts from Stack Overflow
or by user343607
Published on 2010-05-18T02:31:51Z
Indexed on
2010/05/18
2:40 UTC
Read the original article
Hit count: 296
Hello guys,
I am new here and got a question that is tricking me all day long.
I've made a PHP script, that reads a website source code through cURL, then works with DOMDocument class in order to generate a sitemap file.
It is working like a charm in almost every aspect. The problem is with special characters.
For compatibility reasons, sitemap files needs to have all special chars encoded as numbered entities. And I am not achieving that.
For example, one of my entries - automatically read from site URLs, and wrote to sitemap file - is:
http://www.somesite.com/serviços/redesign/
On the source code it should looks like:
http://www.somesite.com/servi*ç*os/redesign/
Just this. But unfortunately, I am really not figuring it out how to do it.
Source code file, server headers, etc... everything is encoded as UTF-8.
I'm using DOMDocument and related extensions to build the XML. (Basically, DOMDocument, $obj->createElement, $obj->appendChild).
htmlentities gives ç instead of ç str_replace does not work. It makes the character just vanish in the output.
I was using $obj->createElement("loc", $url); on my code, and just now I read in PHP manual that I should use $document->createTextNode($page), in order to have entities encoding support.
Well, it is not working either.
Any idea on how to get unstuck of this?
Thanks.
© Stack Overflow or respective owner