Select all links from a Html table using XPath (and HtmlAgilityPack)
Posted
by Adam Asham
on Stack Overflow
See other posts from Stack Overflow
or by Adam Asham
Published on 2010-03-20T22:11:18Z
Indexed on
2010/03/20
22:21 UTC
Read the original article
Hit count: 591
What I am trying to achieve is to extract all links with a href attribute that starts with http://, https:// or /. These links lie within a table (tbody > tr > td etc) with a certain class. I thought I could specify just the the a element without the whole path to it but it does not seem to work. I get a NullReferenceException at the line that selects the links:
var table = doc.DocumentNode.SelectSingleNode("//table[@class='containerTable']");
if (table != null)
{
foreach (HtmlNode item in table.SelectNodes("a[starts-with(@href, 'https://')]"))
{
//not working
I don't know about any recommendations or best practices when it comes to XPath. Do I create overhead when I query the document two times?
© Stack Overflow or respective owner