Parsing HTML "Visually"
        Posted  
        
            by Midhat
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by Midhat
        
        
        
        Published on 2010-06-01T21:06:24Z
        Indexed on 
            2010/06/02
            1:53 UTC
        
        
        Read the original article
        Hit count: 280
        
OKay I am at loss how to name this question. I have some HTML files, probably written by lord Lucifier himself, that I need to parse. It consists of many segments like this, among other html tags
<p>HeadingNumber</p>
<p style="text-indent:number;margin-top:neg_num ">Heading Text</p>
<p>Body</p>
Notice that the heading number and text are in seperate p tags, aligned in a horizontal line by css. the css may be whatever Lucifier fancies, a mixture of indents, paddings, margins and positions.
However that line is a single object in my business model and should be kept as such. So How do I detect whether two p elements are visually in a single line and process them accordingly. I believe the HTML files are well formed if it helps.
© Stack Overflow or respective owner