0

I have a mini-project that requires to download html documents of multiple websites using C# and make it perform as fast as possible. In my scenario I might need to switch IP using proxies based on certain conditions. I want to take advantage of C# Asynchronous Tasks to make it execute as many requests as possible in order for the whole process to be fast and efficient.

Here's the code I have so far.

public class HTMLDownloader
{

    public static List<string> URL_LIST = new List<string>();
    public static List<string> HTML_DOCUMENTS = new List<string>();

    public static void Main()
    {
        for (var i = 0; i < URL_LIST.Count; i++)
        {
            var html = Task.Run(() => Run(URL_LIST[i]));
            HTML_DOCUMENTS.Add(html.Result);
        }
    }

    public static async Task<string> Run(string url)
    {
        var client = new WebClient();
        //Handle Proxy Credentials client.Proxy= new WebProxy();
        string html = "";
        try
        {
            html = await client.DownloadStringTaskAsync(new Uri(url));
            //if(condition ==true)
            //{
            //  return html;
            //}
            //else
            //{
            //  Switch IP and try again
            //}
        }
        catch (Exception e)
        {

        }

        return html;
    }

The problem here is that I'm not really taking advantage of sending multiple web requests because each request has to finish in order for the next one to begin. Is there a better approach to this? For example, send 10 web requests at a time and then send a new request when one of those requests is finished.

Thanks

1
  • Are you sure that your question is ASP.NET related? Commented Apr 17, 2020 at 10:11

3 Answers 3

4

I want to take advantage of C# Asynchronous Tasks to make it execute as many requests as possible in order for the whole process to be fast and efficient.

You can use Task.WhenAll to get asynchronous concurrency.

For example, send 10 web requests at a time and then send a new request when one of those requests is finished.

To throttle asynchronous concurrency, use SemaphoreSlim:

public static async Task Main()
{
  using var limit = new SemaphoreSlim(10); // 10 at a time
  var tasks = URL_LIST.Select(Process).ToList();
  var results = await Task.WhenAll(tasks);
  HTML_DOCUMENTS.AddRange(results);

  async Task<string> Process(string url)
  {
    await limit.WaitAsync();
    try { return await Run(url); }
    finally { limit.Release(); }
  }
}
Sign up to request clarification or add additional context in comments.

Comments

2

One way is to use Task.WhenAll.

Creates a task that will complete when all of the supplied tasks have completed.

The premise is, Select all the tasks into a List, await the list of task with Task.WhenAll, Select the results

public static async Task Main()
{
   var tasks = URL_LIST.Select(Run); 
   await Task.WhenAll(tasks);
   var results = tasks.Select(x => x.Result);
}

Note : The result of WhenAll will be the collection of results as well

1 Comment

Good, but the Select at the end is an unnecessary step.
0

First change your Main to be async.

Then you can use LINQ Select to run the Tasks in parallel.

public static async Task Main()
{
    var tasks = URL_LIST.Select(Run);

    string[] documents = await Task.WhenAll(tasks);

    HTML_DOCUMENTS.AddRange(documents);
}

Task.WhenAll will unwrap the Task results into an array, once all the tasks are complete.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.