0

I'm currently working on a .NET Core console application and I came across a scenario where I need to process large amounts of data. To optimize this, I'm thinking about using multithreading.

Here's a simplified version of my current setup:

public async Task ProcessDataAsync(List<Data> dataList)
{
      foreach (var data in dataList)
      {
          // business logic...
      }
}

I've read about the System.Threading.Tasks namespace and the Parallel.ForEach method, but I'm not sure how to implement it in my scenario. Could anyone provide an example of how I could modify my ProcessData method to use multithreading?

Also, are there best practices or potential issues I should be aware of when implementing multithreading?

2 Answers 2

0

First of all: It depends. Multi-Threading offers great performance-boosts, but also occupies development and cpu-time to get everything working smooth. Very often adding multi-threading is just an attempt to compensate inefficent code.

Note, that just using async / await is NOT multithreading, it is just "unblocking" the current thread to be able to do something else. https://learn.microsoft.com/en-us/dotnet/csharp/asynchronous-programming/task-asynchronous-programming-model

As a personal flavour, i'm using (real) threads for background-tasks, and whenever it comes down to adding multi-threading to a workflow that is overall sequential, with just parallel processing possibilities, Parallel.Foreach() is quite easy to use and can speed up processing of classic for/foreach loops.

In General, Parallel.Foreach() will process items one by one (or say 18 by 18) depending on the degree of parallelism configured. (I always use CPU-Count - 2):

List<Something> somethingList;
int availableCores = 20 - 2;

 Parallel.ForEach(somethingList, new ParallelOptions { MaxDegreeOfParallelism = availableCores }, entry => {
    //entry is now a single "Something" item
 }

Most the time it is more convinient, if you split an existing list into chunks of equal size - and have 18 parallel executions processing one chunk.

Here is a little extension method (for lists) to achieve that:

public enum CollectionChunkStyle
        {
            MAX_LENGTH_PER_CHUNK,
            FIXED_COUNT_OF_CHUNKS
        }

public static List<List<T>> Chunk<T>(this List<T> list, int size, CollectionChunkStyle style)
        {
            List<List<T>> result = new List<List<T>>();
            if (style == Enums.CollectionChunkStyle.MAX_LENGTH_PER_CHUNK)
            {
                List<T> chunk = new List<T>();
                result.Add(chunk);

                int c = 0;
                foreach (T t in list)
                {
                    if (c++ == size)
                    {
                        chunk = new List<T>();
                        result.Add(chunk);
                        c = 1;
                    }

                    chunk.Add(t);
                }
            }

            else if (style == Enums.CollectionChunkStyle.FIXED_COUNT_OF_CHUNKS)
            {
                for (int i=0; i<Math.Min(size, list.Count); i++)
                {
                    result.Add(new List<T>());
                }

                int c = 0;
                foreach (T t in list)
                {
                    result.ElementAt(c++ % size).Add(t);
                }
            }

            return result;
        }

It can either be used with FIXED_COUNT_OF_CHUNKS - so, say "Split this into 18 lists!" or with MAX_LENGTH_PER_CHUNK - saying "Create many chunks, 100 items per chunk please".

Above example then would be as easy as:

List<Something> somethingList;
int availableCores = 20 - 2;
List<List<Something>> chunks = somethingList.Chunk(availableCores, CollectionChunkStyle.FIXED_COUNT_OF_CHUNKS);

 Parallel.ForEach(chunks , new ParallelOptions { MaxDegreeOfParallelism = availableCores }, chunk => {
    //chunk is now a list with 1/18 of somethingLists elements.
 }
Sign up to request clarification or add additional context in comments.

2 Comments

The Chunk already exists as a LINQ operator (starting from .NET 6). Also the Parallel.ForEach already supports partitioning without using the Chunk (example).
@TheodorZoulias Oh, nice to know - well, using that since years, never checked if there are native alternatives - it's sitting in UseThisDLLEverywhere.dll :P
-2

First, I would move your data process to be a single item on a method:

public async Task ProcessDataAsync(Data data)
{
    // business logic...
    return Task.CompletedTask;
}

That will give you the flexibility to run it how you want.

If you want to run a single task off the main thread:

await Task.Run(() => ProcessDataAsync(data));

If you want to run all use Parallel.ForEach or Parallel.ForEachAsync

await Parallel.ForEachAsync(dataList, ProcessDataAsync);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.