Parsing HTML "Visually"
Posted
by Midhat
on Stack Overflow
See other posts from Stack Overflow
or by Midhat
Published on 2010-06-01T21:06:24Z
Indexed on
2010/06/02
1:53 UTC
Read the original article
Hit count: 222
OKay I am at loss how to name this question. I have some HTML files, probably written by lord Lucifier himself, that I need to parse. It consists of many segments like this, among other html tags
<p>HeadingNumber</p>
<p style="text-indent:number;margin-top:neg_num ">Heading Text</p>
<p>Body</p>
Notice that the heading number and text are in seperate p tags, aligned in a horizontal line by css. the css may be whatever Lucifier fancies, a mixture of indents, paddings, margins and positions.
However that line is a single object in my business model and should be kept as such. So How do I detect whether two p elements are visually in a single line and process them accordingly. I believe the HTML files are well formed if it helps.
© Stack Overflow or respective owner