How can i extract links from string with html content using htmlagilitypack?

Question

for (int i = 0; i < numberoflinks; i++)
{
    string downloadString = client.DownloadString(mainlink+i+".html");
    var document = new HtmlWeb().Load(url);
    var urls = document.DocumentNode.Descendants("img")
                        .Select(e => e.GetAttributeValue("src", null))
                        .Where(s => !String.IsNullOrEmpty(s))
}

The problem is that HtmlWeb().Load require a html url but i want to Load the string downloadString which have already the html content inside.

Update:

I tried this now:

for (int i = 0; i < numberoflinks; i++)
            {

                string downloadString = client.DownloadString(mainlink+i+".html");
                HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
                document.Load(downloadString);
                var urls = document.DocumentNode.Descendants("img")
                                                .Select(e => e.GetAttributeValue("src", null))
                                                .Where(s => !String.IsNullOrEmpty(s));
            }

But i'm getting exception on the line:

document.Load(downloadString);

Illegal characters in path

What i'm trying to do is to download/extract all .JPG images from each link. Without download the url first to the hard disk but download the content to a string extract all images links ending with .JPG in this html then download the JPG's.

Community · Accepted Answer · 2020-06-20 09:12:55Z

2

You should be able to process a string of HTML using the LoadHtml() method of HtmlDocument.

From the source code:

public void LoadHtml(string html)

Loads the HTML document from the specified string.

param name="html"

String containing the HTML document to load. May not be null.

The Load method expects a filename, which the is reason for the message about illegal characters in path.

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Oct 4, 2015 at 22:07

David Tansey

6,0104 gold badges39 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How can i extract links from string with html content using htmlagilitypack?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related