0

I want to read the table shown in this link.

When I tried to do with HtmlAgilityPack, I am getting null

var nodes = document.DocumentNode.SelectNodes("//table[contains(@class, 'table')]");

Can you please let me know what is the issue ? Am I doing it in wrong way?

1
  • It works correctly for me (using NuGet HtmlAgilityPack 1.6.5): nodes contains 1 table element. Are you sure you loaded HTML code correctly to document? Can you provide full source code? Commented Nov 20, 2017 at 13:53

2 Answers 2

1

There is nothing wrong with your xpath. I am just gonna assume that you don't know how to get the data out of the table. You need to look up xpaths.

    public static void Main(string[] args)
    {
        HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        try
        {
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create("https://www.manualslib.com/brand/A.html");
            request.Method = "GET";
            request.ContentType = "text/html;charset=utf-8";

            using (var response = (HttpWebResponse)request.GetResponse())
            {
                using (var stream = response.GetResponseStream())
                {
                    doc.Load(stream, Encoding.GetEncoding("utf-8"));
                }
            }
        }
        catch (WebException ex)
        {
            Console.WriteLine(ex.Message);
        }
        //Works fine
        HtmlNode tablebody = doc.DocumentNode.SelectSingleNode("//table[contains(@class, 'table')]/tbody");
        foreach(HtmlNode tr in tablebody.SelectNodes("./tr"))
        {
            Console.WriteLine("\nTableRow: ");
            foreach(HtmlNode td in tr.SelectNodes("./td"))
            {
                if (td.GetAttributeValue("class", "null") == "col1")
                {
                    Console.Write("\t " + td.InnerText);
                }
                else
                {
                    HtmlNode temp = td.SelectSingleNode(".//div[@class='catel']/a");
                    if (temp != null)
                    {
                        Console.Write("\t " + temp.GetAttributeValue("href", "no url"));
                    }
                }


            }
        }
        Console.ReadKey();
    }

First we go into the node, tbody with the xpath, but only if the attribute in the class in the table contains 'table':

//table[contains(@class, 'table')]/tbody

Now we select all the nodes called tr(table row):

./tr

The dot here means that from the current context we're in we go going to find all the tr-nodes. Then in each tr-node we are going to find all the td-nodes with:

./td

Now in each table cell we want to get the data. In the first td we know the class-attribute is equal to 'col1'. So if the td contains a class with that value - we then want to get the text inside that td-node.

If however it doesn't contain that attribute we know that we want the anchor-tag that is inside a div that has a class-attribute with the value 'catel'.

Inside that anchor-tag we want to get the value of the href-attribute.

Sign up to request clarification or add additional context in comments.

Comments

0

Use this way :

document.DocumentNode.SelectNodes("//div[@class='col-sm-8']/table[contains(@class, 'table')]/tbody/tr")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.