How can Perl's XML::Simple ignore HTML embedded in XML?
Posted
by Miriam Raphael Roberts
on Stack Overflow
See other posts from Stack Overflow
or by Miriam Raphael Roberts
Published on 2010-04-14T20:05:44Z
Indexed on
2010/04/16
4:23 UTC
Read the original article
Hit count: 338
I have an XML file that I am pulling from the web and parsing. One of the items in the XML is a 'content' value that has HTML. I am using XML::Simple::XMLin to parse the file like so:
$xml= eval { $data->XMLin($xmldata, forcearray => 1, suppressempty=> +'') };
When I use Data::Dumper
to dump the hash, I discovered that SimpleXML
is parsing the HTML into the hash tree:
'content' => { 'div' => [ { 'xmlns' => 'http://www.w3.org/1999/xhtml', 'p' => [ { 'a' => [ { 'href' => 'http://miamiherald.typepad.com/.a/6a00d83451b26169e20133ec6f4491970b-pi', 'style' => 'FLOAT: left', 'img' => [ etc.....
This is not what I want. I want to just grab content inside of this entry. How do I do this?
© Stack Overflow or respective owner