Getting XML Numbered Entities with PHP 5 DOM
- by user343607
Hello guys,
I am new here and got a question that is tricking me all day long.
I've made a PHP script, that reads a website source code through cURL, then works with DOMDocument class in order to generate a sitemap file.
It is working like a charm in almost every aspect. The problem is with special characters.
For compatibility reasons, sitemap files needs to have all special chars encoded as numbered entities. And I am not achieving that.
For example, one of my entries - automatically read from site URLs, and wrote to sitemap file - is:
http://www.somesite.com/serviços/redesign/
On the source code it should looks like:
http://www.somesite.com/servi*ç*os/redesign/
Just this. But unfortunately, I am really not figuring it out how to do it.
Source code file, server headers, etc... everything is encoded as UTF-8.
I'm using DOMDocument and related extensions to build the XML. (Basically, DOMDocument, $obj-createElement, $obj-appendChild).
htmlentities gives ç instead of ç
str_replace does not work. It makes the character just vanish in the output.
I was using $obj-createElement("loc", $url); on my code, and just now I read in PHP manual that I should use $document-createTextNode($page), in order to have entities encoding support.
Well, it is not working either.
Any idea on how to get unstuck of this?
Thanks.