1

I want to get specific data from html. Im using c# and HtmlAgilityPack

Here's the HTML sample:

<p class="heading"><span>Greeting!</span>

<p class='verse'>Hi!<br>               //
Hello!</p><p class='verse'>Hello!<br>  // i want to get this g
Hi!</p>                                //

<p class="writers"><strong>WE</strong><br/>

Here my code in c#:

StringBuilder pureText = new StringBuilder();
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(Lyrics);

var s = doc.DocumentNode.Descendants("p");

try
{
     foreach (HtmlNode childNode in s)
     {
                        pureText.Append(childNode.InnerText);
     }
}
catch
{ }

UPDATE:

StringBuilder pureText = new StringBuilder();
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(URL);

var s = doc.DocumentNode.SelectNodes("//p[@class='verse']"); // error

try
{
     foreach (HtmlNode childNode in s)
     {
            pureText.Append(childNode.InnerText);
     }
}
catch
{ }

ERROR:

'HtmlAgilityPack.HtmlNode' does not contain a definition for 'SelectNodes' and no extension method 'SelectNodes' accepting a first argument of type 'HtmlAgilityPack.HtmlNode' could be found (are you missing a using directive or an assembly reference?)

1 Answer 1

5

You can try with XPath query syntax to select all <p> having class='verse', like this :

var s = doc.DocumentNode.SelectNodes("//p[@class='verse']");

Then do the same foreach as you already have.

UPDATE I :

I don't know why the code above throwing error for you. It has been tested in my PC and should work fine. Anyway if you accept workaround, the same query can be achieved without XPath this way :

var s = doc.DocumentNode.Descendants("p").Where(o => o.Attributes["class"] != null && o.Attributes["class"].Value == "verse");

This solution is longer since we need to check if a node has class attibutes or not, before checking the attributes' value. Otherwise, we'll get Null Reference Exception if there any <p> without class attributes.

Sign up to request clarification or add additional context in comments.

6 Comments

has an error of 'HtmlAgilityPack.HtmlNode' does not contain a definition for 'SelectNodes' and no extension method 'SelectNodes' accepting a first argument of type 'HtmlAgilityPack.HtmlNode' could be found (are you missing a using directive or an assembly reference?)
argument of SelectNodes should be a string as you see in my answer, not HtmlNode. How you applied this solution? Try to post your code that trigger the error if you couldn't find out how to fix it
StringBuilder pureText = new StringBuilder(); HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(URL); var s = doc.DocumentNode.SelectNodes("//p[@class='verse']"); // error try { foreach (HtmlNode childNode in s) { pureText.Append(childNode.InnerText); } } catch { }
are you working on WinRT application? if yes, this post maybe related to the error you got. WinRT doesn't support XPath.
Yes im working on WinRT. So SelectNodes doesnt work. How do i get data from html?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.