OpenMP recursive parallelism (QuickSort)

Question

I've been trying to parallelize Quick Sort using OpenMP, but it seems that I've done something wrong on that part since the more threads used the slower it goes!

I know there is always overhead included, but increasing number of threads in a giant list should make it faster and not slower (my case).

Here is the code enjoy!

#include <omp.h>

double start_time, end_time;

#include <stdio.h>
#define MAXSIZE 10000  /* maximum array size */
#define MAXWORKERS 8   /* maximum number of workers */

int numWorkers;
int size; 
int doge[MAXSIZE];
void breakdown(int, int);

/* read command line, initialize, and create threads */
int main(int argc, char *argv[]) {
srand(time(NULL));
int i;

/* read command line args if any */
size = (argc > 1)? atoi(argv[1]) : MAXSIZE;
numWorkers = (argc > 2)? atoi(argv[2]) : MAXWORKERS;
if (size > MAXSIZE) size = MAXSIZE;
if (numWorkers > MAXWORKERS) numWorkers = MAXWORKERS;

for(i = 0;i<size;i++){
    doge[i] = 1+rand()%99;
}

omp_set_num_threads(numWorkers);

start_time = omp_get_wtime();
#pragma omp parallel
{
    #pragma omp single nowait
    {
        breakdown(0, size);
    }
}

end_time = omp_get_wtime();

for(i = 0;i<size;i++){
    printf("%d ", doge[i]);
}
printf("it took %g seconds\n", end_time - start_time);
  }

  void breakdown(int from, int to){

if(to-from < 2){
    return;
}
int left, right, temp;
int i_pivot  = from + rand()%(to-from);
int pivot = doge[i_pivot];

left = from;
right = to;
while (left <= right){
    if (doge[left] > pivot){
        /* swap left element with right element */
        temp = doge[left];
        doge[left] = doge[right];
        doge[right] = temp;
        if (right == i_pivot)
            i_pivot = left;
        right--;
    }
    else
        left++;
}
/* place the pivot in its place (i.e. swap with right element) */
temp = doge[right];
doge[right] = pivot;
doge[i_pivot] = temp;

#pragma omp task
{
    breakdown(from, right - 1);
}
#pragma omp task
{
    breakdown(right + 1, to);
}
//implicit DOGE
  }

I believe I've done the parallalization wrong in short.. these lines:

#pragma omp parallel
{
    #pragma omp single nowait
    {
        breakdown(0, size);
    }
}

AND

#pragma omp task
{
    breakdown(from, right - 1);
}
#pragma omp task
{
    breakdown(right + 1, to);
}

Any help would be doge

Increasing the number of threads only improves performance if all the threads can run at once (multiple processor cores, no data overlap, etc.) or if the process involves wait states (I/O, for example) and work can be done on another thread during the wait. It is NOT automatically an improvement on all, or even on most, machines and tasks. — keshlam
– keshlam, Commented Feb 20, 2014 at 22:01
Yes I do agree with that, but in this case I'm running on a 4 core (hyperthreaded x2) machine with lists that are quite big.. so I should be able to see some kind of improvement as quick sort is parallised. — David Ekermann
– David Ekermann, Commented Feb 21, 2014 at 0:13
Only if the threads are truly independent, do not cause extra cache flushes, and do run in different cores (which depends on details of the implementation). Basically, "should" is too strong a statement here unless you know details of implementation of the hardware and the language. — keshlam
– keshlam, Commented Feb 21, 2014 at 0:16

user2849936 · Accepted Answer · 2014-03-23 22:50:35Z

1

Did you try a larger array? 10000 is nothing, that should be sorted instantly, u need millions of numbers to make it run for atleast few seconds.

answered Mar 23, 2014 at 22:50

user2849936

739 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

David Ekermann Over a year ago

Yes I tried a lot of sizes above 10000 (I changed MAXSIZE).

user2849936 Over a year ago

I see you are using MAXWORKERS 8 - but do you stop creating tasks at some point? Ideally you should create only 8 tasks on the lowest level (one for each 1/8 of array) and then do it sequentially in those tasks, without creating more tasks. By creating too many tasks the performance drops severely.

Collectives™ on Stack Overflow

OpenMP recursive parallelism (QuickSort)

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related