2

I have 5000+ pages I want to download using WebClient. Since I want that done as fast as possible I am trying to use multithreading (using BlockingCollection in my case), but the program always seems to be crashing after a while with error - "System.Net.WebException". If I add some Thread.Sleep(3000) delay it slows down my download process and it returns the error after a little more time.

It usually takes about 2-3 seconds to download one page.

Normally, I would guess that there is a problem with my BlockingCollection, but it works fine with other tasks, so I am pretty sure that something has to be wrong with my WebClient requests. I think there might be some kind of overlapping between the separate WebClients, but that's just guessing.

        Multithreading multiThread = new Multithreading(5); 
        for(int pageNumber = 1; pageNumber <= 5181; pageNumber++)
        {
            multiThread.EnqueueTask(new Action(() => //add task ("scrape the trader") to the multithread queue
            {
                using (WebClient client = new WebClient())
                {
                    client.DownloadFile("http://example.com/page=" + pageNumber.ToString(), @"C:\mypages\page " + pageNumber.ToString() + ".html");
                } 
            }));
            //I put the Thread.Sleep(123) delay here
        }

If I add a smaller delay (Thread.Sleep(100) for example) it works fine, but then I end up scraping Page # *whatever pageNumber's value is at the moment*, not in order as it usually does.

Here is my BlockingCollection (I think I got this code from stackoverflow):

class Multithreading : IDisposable
{
      BlockingCollection<Action> _taskQ = new BlockingCollection<Action>();

      public Multithreading(int workerCount)
      {
        // Create and start a separate Task for each consumer:
        for (int i = 0; i < workerCount; i++)
          Task.Factory.StartNew (Consume);
      }

      public void Dispose() { _taskQ.CompleteAdding(); }

      public void EnqueueTask (Action action) { _taskQ.Add (action); }

      void Consume()
      {
        // This sequence that we’re enumerating will block when no elements
        // are available and will end when CompleteAdding is called. 
        foreach (Action action in _taskQ.GetConsumingEnumerable())
          action();     // Perform task.
      }
}

I also tried putting everything into endless while loop and handling the error using try...catch statements, but apparently it does not return the error immediately, but after a while (not sure when).

Here is the whole exception:

An exception of type 'System.Net.WebException' occurred in System.dll but was not handled in user code

Additional information: An exception occurred during a WebClient request.
4
  • 1
    Show us the stacktrace Commented May 12, 2014 at 12:21
  • 1
    If you want to download files async why not using the client.DownloadFileAsync operation? Commented May 12, 2014 at 12:32
  • whats in the exception? Commented May 12, 2014 at 12:33
  • @Stefan, I have added the whole exception at the bottom of my post. Commented May 12, 2014 at 12:38

1 Answer 1

5

The class is not guaranteed to be thread safe. from MSDN:

Any instance members are not guaranteed to be thread safe

Update

Use one HttpWebRequest for each request that you make. If you make a lot of requests to different web sites it doesn't matter if you use WebClient or HttpWebRequest.

If you do a lot of requests to the same web site it is still not as inefficient as it seems. HttpWebRequest reuse connections (it's hidden underneath the hood). Microsoft uses something called service points and you can access them through the HttpWebRequest.ServicePoint property. If you click on the property definition you come to the ServicePoint documentation where you can fine tune the number of connections per web site etc.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.