Getting XML Numbered Entities with PHP 5 DOM

Posted by user343607 on Stack Overflow See other posts from Stack Overflow or by user343607
Published on 2010-05-18T02:31:51Z Indexed on 2010/05/18 2:40 UTC
Read the original article Hit count: 295

Hello guys,

I am new here and got a question that is tricking me all day long.

I've made a PHP script, that reads a website source code through cURL, then works with DOMDocument class in order to generate a sitemap file.

It is working like a charm in almost every aspect. The problem is with special characters.

For compatibility reasons, sitemap files needs to have all special chars encoded as numbered entities. And I am not achieving that.

For example, one of my entries - automatically read from site URLs, and wrote to sitemap file - is:

http://www.somesite.com/serviços/redesign/

On the source code it should looks like:

http://www.somesite.com/servi*ç*os/redesign/

Just this. But unfortunately, I am really not figuring it out how to do it.

Source code file, server headers, etc... everything is encoded as UTF-8.

I'm using DOMDocument and related extensions to build the XML. (Basically, DOMDocument, $obj->createElement, $obj->appendChild).

htmlentities gives ç instead of ç str_replace does not work. It makes the character just vanish in the output.

I was using $obj->createElement("loc", $url); on my code, and just now I read in PHP manual that I should use $document->createTextNode($page), in order to have entities encoding support.

Well, it is not working either.

Any idea on how to get unstuck of this?

Thanks.

© Stack Overflow or respective owner

Related posts about php

Related posts about utf-8