Parsing XHTML with inline tags
- by user290796
Hi,
I'm trying to parse an XHTML document using TBXML on the iPhone (although I would be happy to use either libxml2 or NSXMLParser if it would be easier). I need to extract the content of the body as a series of paragraphs and maintain the inline tags, for example:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>Title</title>
<link rel="stylesheet" href="css/style.css" type="text/css"/>
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
</head>
<body>
<div class="body">
<div>
<h3>Title</h3>
<p>Paragraph with <em>inline</em> tags</p>
<img src="image.png" />
</div>
</div>
</body>
</html>
I need to extract the paragraph but maintain the <em>inline</em> content with the paragraph, all my testing so far has extracted that as a subelement without me knowing exactly where it fitted in the paragraph.
Can anyone suggest a way to do this?
Thanks.