1

I'm trying to extract text from this HTML tag

<span id="example1">sometext</span>

And I have this code:

using System;
using System.Net;
using HtmlAgilityPack;

namespace GC_data_console
{
    class Program
    {
        public static void Main(string[] args)
        {           
            using (var client = new WebClient())
            {
                // Download the HTML
                string html =
                    client.DownloadString("https://www.requestedwebsite.com");
                HtmlDocument doc = new HtmlDocument();
                doc.LoadHtml(html);
                foreach(HtmlNode link in
                        doc.DocumentNode.SelectNodes("//span"))
                {
                    HtmlAttribute href = link.Attributes["id='example1'"];
                    if (href != null)
                    {
                        Console.WriteLine(href.Value.ToString());
                        Console.ReadLine();
                    }
                }
            }
        }
    }
}

But I am still not getting the text sometext. But if I insert:

HtmlAttribute href = link.Attributes["id"];

I'll get all the IDs names. What am I doing wrong?

4
  • Can you share the actual URL for which you are trying to get the contents? Also you are trying to get the value of HtmlAttribute not the element. What you need to try to get is link.InnerText. Commented Apr 9, 2017 at 14:16
  • Hello, for example from this webpage geocaching.com/geocache/GC257YR_slivercup-studios-east and i am trying to get the text from the tag : <span id="ctl00_ContentBody_CacheName">SliverCup Studios East</span> Commented Apr 9, 2017 at 14:25
  • Got it.... Did you try the other way I suggested? Also did you debug and check if you are getting the correct element? Commented Apr 9, 2017 at 14:28
  • I tried, HtmlAttribute href = link.InnerText["id='ctl00_ContentBody_CacheName'"]; but it didnt work and i get Argument 1: Cannot implicitly convert type 'int' to 'string' (CS1503) error Commented Apr 9, 2017 at 14:35

1 Answer 1

1

You need to first understand difference between HTML Node and HTMLAttribute. You code is nowhere near to solve the problem.

HTMLNode represents the tags used in HTML such as span,div,p,a and lot other. HTMLAttribute represents attribute which are used for the HTMLNodes such as href attribute is used for a, and style,class, id, name etc. attributes are used for almost all the HTML tags.

In below HTML

<span id="firstName" style="color:#232323">Some Firstname</span>

span is HTMLNode while id and style are the HTMLAttributes. and you can get value Some FirstName by using HtmlNode.InnerText property.

Also selecting HTMLNodes from HtmlDocument is not that straight forward. You need to provide proper XPath to select node you want.

Now in your code if you want to get the text written in <span id="ctl00_ContentBody_CacheName">SliverCup Studios East</span>, which is part of HTML of someurl.com, you need to write following code.

using (var client = new WebClient())
{
    string html = client.DownloadString("https://www.someurl.com");

    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(html);

   //Selecting all the nodes with tagname `span` having "id=ctl00_ContentBody_CacheName".
    var nodes = doc.DocumentNode.SelectNodes("//span")
        .Where(d => d.Attributes.Contains("id"))
        .Where(d => d.Attributes["id"].Value == "ctl00_ContentBody_CacheName");

    foreach (HtmlNode node in nodes)
    {
        Console.WriteLine(node.InnerText);
    }
}

The above code will select all the span tags which are directly under the document node of the HTML. Tags which are located deep inside the hierarchy you need to use different XPath.

This should help you resolve your issue.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! This resolved my problem and also thanks for explanation. It is quite long ago since i've created something in html. Now i have somehow "log" through the WebClient, so i can store data, that are only offered to logged in users, but i will be doing this in the future.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.