Select all links from a Html table using XPath (and HtmlAgilityPack)

Posted by Adam Asham on Stack Overflow See other posts from Stack Overflow or by Adam Asham
Published on 2010-03-20T22:11:18Z Indexed on 2010/03/20 22:21 UTC
Read the original article Hit count: 591

Filed under:
|
|

What I am trying to achieve is to extract all links with a href attribute that starts with http://, https:// or /. These links lie within a table (tbody > tr > td etc) with a certain class. I thought I could specify just the the a element without the whole path to it but it does not seem to work. I get a NullReferenceException at the line that selects the links:

var table = doc.DocumentNode.SelectSingleNode("//table[@class='containerTable']");
if (table != null)
{
    foreach (HtmlNode item in table.SelectNodes("a[starts-with(@href, 'https://')]"))
    {
        //not working

I don't know about any recommendations or best practices when it comes to XPath. Do I create overhead when I query the document two times?

© Stack Overflow or respective owner

Related posts about c#

Related posts about htmlagilitypack