Iteration through the HtmlDocument.All collection stops at the referenced stylesheet?

Posted by Jonas on Stack Overflow See other posts from Stack Overflow or by Jonas
Published on 2010-03-19T12:28:58Z Indexed on 2010/03/19 12:31 UTC
Read the original article Hit count: 179

Filed under:
|
|

Since "bug in .NET" is often not the real cause of a problem, I wonder if I'm missing something here.

What I'm doing feels pretty simple. I'm iterating through the elements in a HtmlDocument called doc like this:

System.Diagnostics.Debug.WriteLine("*** " + doc.Url + " ***");
foreach (HtmlElement field in doc.All)
    System.Diagnostics.Debug.WriteLine(string.Format("Tag = {0}, ID = {1} ", field.TagName, field.Id));

I then discovered the debug window output was this:

Tag = !, ID =  
Tag = HTML, ID =  
Tag = HEAD, ID =  
Tag = TITLE, ID =  
Tag = LINK, ID =  

... when the actual HTML document looks like this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
    <head>
        <title>Protocol</title>
        <link rel="Stylesheet" type="text/css" media="all" href="ProtocolStyle.css">
    </head>
    <body onselectstart="return false">
        <table>
            <!-- Misc. table elements and cell values -->
        </table>
    </body>
</html>

Commenting out the LINK tag solves the issue for me, and the document is completely parsed. The ProtocolStyle.css file exist on disk and is loaded properly, if that would matter. Is this a bug in .NET 3.5 SP1, or what? For being such a web-oriented framework, I find it hard to believe there would be such a major bug in it.

© Stack Overflow or respective owner

Related posts about c#

Related posts about .NET