0

I spent the last few days on creating a parallel version of a code (college work), but I came to a dead end (at least for me): The parallel version is nearly as twice slower than the sequential one, and I have no clue on why. Here is the code:

Variables.GetMatrix();
int ThreadNumber = Environment.ProcessorCount/2;
int SS = Variables.PopSize / ThreadNumber;
//GeneticAlgorithm GA = new GeneticAlgorithm();
Stopwatch stopwatch = new Stopwatch(), st = new Stopwatch(), st1 = new Stopwatch();
List<Thread> ThreadList = new List<Thread>();
//List<Task> TaskList = new List<Task>();
GeneticAlgorithm[] SubPop = new GeneticAlgorithm[ThreadNumber];
Thread t;
//Task t;
ThreadVariables Instance = new ThreadVariables();

stopwatch.Start();
st.Start();
PopSettings();
InitialPopulation();
st.Stop();

//Lots of attributions...
int SPos = 0, EPos = SS;

for (int i = 0; i < ThreadNumber; i++)
{
    int temp = i, StartPos = SPos, EndPos = EPos;
    t = new Thread(() =>
    {
        SubPop[temp] = new GeneticAlgorithm(Population, NumSeq, SeqSize, MaxOffset, PopFit, Child, Instance, StartPos, EndPos);
        SubPop[temp].RunGA();
        SubPop[temp].ShowPopulation();
    });
    t.Start();
    ThreadList.Add(t);
    SPos = EPos;
    EPos += SS;
}

foreach (Thread a in ThreadList)
    a.Join();

double BestFit = SubPop[0].BestSol;
string BestAlign = SubPop[0].TV.Debug;

for (int i = 1; i < ThreadNumber; i++)
{
    if (BestFit < SubPop[i].BestSol)
    {
        BestFit = SubPop[i].BestSol;
        BestAlign = SubPop[i].TV.Debug;
        Variables.ResSave = SubPop[i].TV.ResSave;
        Variables.NumSeq = SubPop[i].TV.NumSeq;
    }
}

Basically the code creates an array of the object type, instantiante and run the algorithm in each position of the array, and collecting the best value of the object array at the end. This type of algorithm works on a three-dimentional data array, and on the parallel version I assign each thread to process one range of the array, avoiding concurrency on data. Still, I'm getting the slow timing... Any ideas?

I'm using an Core i5, which has four cores (two + two hyperthreading), but any amount of threads greater than one I use makes the code run slower.

What I can explain of the code I'm running in parallel is:

The second method being called in the code I posted makes about 10,000 iterations, and in each iteration it calls one function. This function may or may not call others more (spread across two different objects for each thread) and make lots of calculations, it depends on a bunch of factors which are particular of the algorithm. And all these methods for one thread work in an area of a data array that isn't accessed by the other threads.

9
  • 1
    How does a Parallel.ForEach compare? Commented May 14, 2014 at 17:46
  • If it's a fast algorithm, threading overhead could easily nullify any benefit of parallelizing. Commented May 14, 2014 at 17:46
  • Check out this documentation for plinq it also has information on what might cause parallel operations to be slower. Commented May 14, 2014 at 17:47
  • in a very quick view of your code...a.Join() maybe the cause of you problems Commented May 14, 2014 at 17:48
  • 1
    You're not showing us the actual code that you're trying to run in parallel - you're showing us the code that sets up the parallelism and then aggregates the results. If there's a bottleneck, it's highly likely to be in the code that you're running in parallel. Commented May 14, 2014 at 17:54

3 Answers 3

1

With System.Linq there is a lot to make simpler:

int ThreadNumber = Environment.ProcessorCount/2;
int SS = Variables.PopSize / ThreadNumber;
int numberOfTotalIterations = // I don't know what goes here.

var doneAlgorithms = Enumerable.Range(0, numberOfTotalIterations)
                               .AsParallel() // Makes the whole thing running in parallel
                               .WithDegreeOfParallelism(ThreadNumber) // We don't need this line if you want the system to manage the number of parallel processings.
                               .Select(index=> _runAlgorithmAndReturn(index,SS))
                               .ToArray(); // This is obsolete if you only need the collection of doneAlgorithms to determine the best one.
                                           // If not, keep it to prevent multiple enumerations.

// So we sort algorithms by BestSol ascending and take the first one to determine the "best".
// OrderBy causes a full enumeration, hence the above mentioned obsoletion of the ToArray() statement.
GeneticAlgorithm best = doneAlgorithms.OrderBy(algo => algo.BestSol).First();

BestFit = best.Bestsol;
BestAlign = best.TV.Debug;
Variables.ResSave = best.TV.ResSave;
Variables.NumSeq = best.TV.NumSeq;

And declare a method to make it a bit more readable

/// <summary>
/// Runs a single algorithm and returns it
/// </summary>
private GeneticAlgorithm _runAlgorithmAndReturn(int index, int SS)
{
    int startPos = index * SS;
    int endPos = startPos + SS;
    var algo = new GeneticAlgorithm(Population, NumSeq, SeqSize, MaxOffset, PopFit, Child, Instance, startPos, endPos);
    algo.RunGA();
    algo.ShowPopulation();
    return algo;
}
Sign up to request clarification or add additional context in comments.

4 Comments

Does the use of LINQ provide speedup alone?
As mentioned in the comments of your question, PLinq makes use of the threadpool and thus the overhead of manually creating threads will be gone. If this wont gain much performance you might have too many lock() statements in your algorithm which block the performance crucial parts from parallelization.
Also you might want to specify the execution mode. msdn.microsoft.com/en-us/library/dd642145%28v=vs.110%29.aspx that would be equivalently done like the WithDegreeOfParallelism
Here is a simple demonstration of "only locking when needed" if you arent that familliar with locking dotnetfiddle.net/RPgQdp
0

There is a big overhead in creating threads.

Instead of creating new threads, use the ThreadPool, as show below:

Variables.GetMatrix();
int ThreadNumber = Environment.ProcessorCount / 2;
int SS = Variables.PopSize / ThreadNumber;
//GeneticAlgorithm GA = new GeneticAlgorithm();
Stopwatch stopwatch = new Stopwatch(), st = new Stopwatch(), st1 = new Stopwatch();
List<WaitHandle> WaitList = new List<WaitHandle>();
//List<Task> TaskList = new List<Task>();
GeneticAlgorithm[] SubPop = new GeneticAlgorithm[ThreadNumber];
//Task t;
ThreadVariables Instance = new ThreadVariables();

stopwatch.Start();
st.Start();
PopSettings();
InitialPopulation();
st.Stop();
//lots of attributions...
int SPos = 0, EPos = SS;

for (int i = 0; i < ThreadNumber; i++)
{
    int temp = i, StartPos = SPos, EndPos = EPos;
    ManualResetEvent wg = new ManualResetEvent(false);
    WaitList.Add(wg);
    ThreadPool.QueueUserWorkItem((unused) =>
    {
        SubPop[temp] = new GeneticAlgorithm(Population, NumSeq, SeqSize, MaxOffset, PopFit, Child, Instance, StartPos, EndPos);
        SubPop[temp].RunGA();
        SubPop[temp].ShowPopulation();
        wg.Set();
    });

    SPos = EPos;
    EPos += SS;
}

ManualResetEvent.WaitAll(WaitList.ToArray());

double BestFit = SubPop[0].BestSol;
string BestAlign = SubPop[0].TV.Debug;

for (int i = 1; i < ThreadNumber; i++)
{
    if (BestFit < SubPop[i].BestSol)
    {
        BestFit = SubPop[i].BestSol;
        BestAlign = SubPop[i].TV.Debug;
        Variables.ResSave = SubPop[i].TV.ResSave;
        Variables.NumSeq = SubPop[i].TV.NumSeq;
    }
}

Note that instead of using Join to wait the thread execution, I'm using WaitHandles.

6 Comments

It depends. Each application have one ThreadPool and long running operations will block the ThreadPool, if you have another things to run using it, then it will be a problem. But, if you have long running operations, the overhead when creating a thread will not be a problem. Do you know how many threads you create and the average time of executing each operation?
Well, i'm creating a total of 4 threads, and the execution time depends on the array entry processed by the algorithm. It can take a range of 20s to 5, 6min depending on the entry.
Then the thread overhead is not your problem. You must check if you're using lock then your application is waiting more of the time. You can also check the processor usage. A good algorithm can make the processor at 100% most of the time.
Yeah, CPU usage is at 100% most of the time. And no, i don't use lock in any part of the code.
You can check the memory usage, maybe your algorithm use too much memory and the threaded version uses 4x more and are doing a lot of swap.
|
0

You're creating the threads yourself, so there's some extreme overhead there. Parallelise like the comments suggested. Also make sure the time a single work-unit takes is long enough. A single thread/workunit should be alive for at least ~20 ms.

Pretty basic things really. I'd suggest you really read up on how multi-threading in .NET works.

I see you don't create too many threads. But the optimal threadcount can't be determined just from the processor count. The built-in Parallel class has advanced algorithms to reduce the overall time.

Partitioning and threading are some pretty complex things that require a lot knowledge to get right, so unless you REALLY know what you're doing rely on the Parallel class to handle it for you.

1 Comment

Yeah, a single thread works more than 20ms. And i just don't use the parallel methods because i've been reading that, overall, they aren't the best choice for long running operations.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.