Multi-thread C# queue in .Net 4

Question

I'm developing a simple crawler for web pages. I've searched an found a lot of solutions for implementing multi-threaded crawlers. What is is the best way to create a thread-safe queue to contain unique URLs?

EDIT: Is there a better solution in .Net 4.5?

possible duplicate of Classes in .Net 4.5 for writing a multi-thread C# crawler — Henk Holterman
– Henk Holterman, Commented Apr 10, 2012 at 10:49
OK! So I go there and post a question, few people vote for closing because it's not in ONE area. I come here and post it in ONE area, now you say it's a duplicate! I think whatever I do, some people want to try to close questions. It's easier than answering, right?! — Alireza Noori
– Alireza Noori, Commented Apr 10, 2012 at 10:53
You should consider deleting your old question that covers multiple areas. That way, this one won't be closed as a duplicate of the other question :) — K Mehta
– K Mehta, Commented Apr 10, 2012 at 11:01
Oh, I'm sorry. I didn't know I can delete a question :D Thanks. Funny, I've spent so much time on SO :D — Alireza Noori
– Alireza Noori, Commented Apr 10, 2012 at 11:03

Aliostad · Accepted Answer · 2012-04-10 22:02:55Z

2

Use the Task Parallel Library and use the default scheduler which uses ThreadPool.

OK, this is a minimal implementation which queues 30 URLs at a time:

    public static void WebCrawl(Func<string> getNextUrlToCrawl, // returns a URL or null if no more URLs 
        Action<string> crawlUrl, // action to crawl the URL 
        int pauseInMilli // if all threads engaged, waits for n milliseconds
        )
    {
        const int maxQueueLength = 50;
        string currentUrl = null;
        int queueLength = 0;

        while ((currentUrl = getNextUrlToCrawl()) != null)
        {
            string temp = currentUrl;
            if (queueLength < maxQueueLength)
            {
                Task.Factory.StartNew(() =>
                    {
                        Interlocked.Increment(ref queueLength);
                        crawlUrl(temp);
                    }
                    ).ContinueWith((t) => 
                    {
                        if(t.IsFaulted)
                            Console.WriteLine(t.Exception.ToString());
                        else
                            Console.WriteLine("Successfully done!");
                        Interlocked.Decrement(ref queueLength);
                    }
                    );
            }
            else
            {
                Thread.Sleep(pauseInMilli);
            }
        }
    }

Dummy usage:

    static void Main(string[] args)
    {
        Random r = new Random();
        int i = 0;
        WebCrawl(() => (i = r.Next()) % 100 == 0 ? null : ("Some URL: " + i.ToString()),
            (url) => Console.WriteLine(url),
            500);

        Console.Read();

    }

edited Apr 10, 2012 at 22:02

answered Apr 10, 2012 at 10:45

Aliostad

81.9k21 gold badges164 silver badges209 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Alireza Noori Over a year ago

What about the new .Net 4.5? Is there a better solution in .Net 4.5? And could you please post a sample?

Aliostad Over a year ago

@AlirezaNoori 4.5 is not officially out yet so how does that help you? I am not aware of any new classes that can help although async and wait keywords will help.

Alireza Noori Over a year ago

I'm developing this app for my research. So it's not a problem. I have used async coding in Windows 8 but do you think using async is better than multithreading?

flytzen Over a year ago

@AlirezaNoori for doing a web crawler, you should really use async rather than multithreading as the majority of your time will be spent waiting for web pages. However, the async (certainly prior to 4.5) can be a bit complex to write, so whether the additional complexity is worth it depends on a lot of factors, including whether monopolising a lot of threads is a problem. it's a complex question and worth doing a lot of research into to understand fully.

Alireza Noori Over a year ago

I can easily use .Net 4.5 so I guess I can use async. Thanks. I'm going to give it a try.

|

Ohad Schneider · Accepted Answer · 2012-04-10 11:21:53Z

2

ConcurrentQueue is indeed the framework's thread-safe queue implementation. But since you're likely to use it in a producer-consumer scenario, the class you're really after may be the infinitely useful BlockingCollection.

answered Apr 10, 2012 at 11:21

Ohad Schneider

38.5k16 gold badges178 silver badges212 bronze badges

2 Comments

Alireza Noori Over a year ago

Could you please post a very quick sample? Thanks

Ohad Schneider Over a year ago

Go to the link I gave for BlockingCollection. At the bottom you'll find a simple usage example.

Simon Cowen · Accepted Answer · 2012-04-10 10:51:41Z

1

Would System.Collections.Concurrent.ConcurrentQueue<T> fit the bill?

answered Apr 10, 2012 at 10:51

Simon Cowen

1,90317 silver badges20 bronze badges

1 Comment

Alireza Noori Over a year ago

Thanks. Is there a better solution in .Net 4.5? And could you please post a simple sample?

flytzen · Accepted Answer · 2012-04-10 10:52:21Z

1

I'd use System.Collections.Concurrent.ConcurrentQueue.

You can safely queue and dequeue from multiple threads.

answered Apr 10, 2012 at 10:52

flytzen

7,5085 gold badges41 silver badges54 bronze badges

Comments

Martin James · Accepted Answer · 2012-04-10 11:13:50Z

1

Look at System.Collections.Concurrent.ConcurrentQueue. If you need to wait, you could use System.Collections.Concurrent.BlockingCollection

answered Apr 10, 2012 at 11:13

Martin James

25k4 gold badges39 silver badges60 bronze badges

Collectives™ on Stack Overflow

Multi-thread C# queue in .Net 4

5 Answers 5

11 Comments

2 Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

11 Comments

2 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related