Select all links from a Html table using XPath (and HtmlAgilityPack)
- by Adam Asham
What I am trying to achieve is to extract all links with a href attribute that starts with http://, https:// or /. These links lie within a table (tbody tr td etc) with a certain class. I thought I could specify just the the a element without the whole path to it but it does not seem to work. I get a NullReferenceException at the line that selects the links:
var table = doc.DocumentNode.SelectSingleNode("//table[@class='containerTable']");
if (table != null)
{
foreach (HtmlNode item in table.SelectNodes("a[starts-with(@href, 'https://')]"))
{
//not working
I don't know about any recommendations or best practices when it comes to XPath. Do I create overhead when I query the document two times?