Parsing XHTML with inline tags

Posted by user290796 on Stack Overflow See other posts from Stack Overflow or by user290796
Published on 2010-04-16T15:19:14Z Indexed on 2010/04/16 15:23 UTC
Read the original article Hit count: 252

Filed under:
|
|
|

Hi,

I'm trying to parse an XHTML document using TBXML on the iPhone (although I would be happy to use either libxml2 or NSXMLParser if it would be easier). I need to extract the content of the body as a series of paragraphs and maintain the inline tags, for example:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>Title</title> <link rel="stylesheet" href="css/style.css" type="text/css"/> <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/> </head> <body> <div class="body"> <div> <h3>Title</h3> <p>Paragraph with <em>inline</em> tags</p> <img src="image.png" /> </div> </div> </body> </html>

I need to extract the paragraph but maintain the <em>inline</em> content with the paragraph, all my testing so far has extracted that as a subelement without me knowing exactly where it fitted in the paragraph.

Can anyone suggest a way to do this?

Thanks.

© Stack Overflow or respective owner

Related posts about iphone

Related posts about Xml