0

I have the following piece of c code,

double findIntraClustSimFullCoverage(cluster * pCluster)
{
    double sum = 0;
    register int i = 0, j = 0;
    double perElemSimilarity = 0;

    for (i = 0; i < 10000; i++)
    {
        perElemSimilarity = 0;

        for (j = 0; j < 10000; j++)
        {

            perElemSimilarity += arr[i][j];

        }
        perElemSimilarity /= pCluster->size;
        sum += perElemSimilarity;
    }
    return (sum / pCluster->size);
}

NOTE: arr is a matrix of size 10000 X 10000

This is a portion of a GA code, hence this nested for loop runs many times. This affects the performance of the code i.e. takes hell a lot of time to give the results. I profiled the code using valgrind / kcachegrind. This indicated that 70 % of the process execution time was spent in running this nested for loop. The register variables i and j, do not seem to be stored in register values (profiling with and without "register" keyword indicated this)

I simply can not find a way to optimize this nested for loop portion of code (as it is very simple and straight forward). Please help me in optimizing this portion of code.

5
  • 4
    The register keyword is pretty much ignored by all modern compilers. So don't expect to see a difference. Commented Jan 21, 2012 at 8:40
  • 1
    How often is arr updated? Maybe it's better to recalculate the value dynamically on each update? Commented Jan 21, 2012 at 8:43
  • Yes, arr is a 2D array of doubles. The array arr is not updated at all. I will write the values of all the array subscripts only once and then on, I keep reading it. Commented Jan 21, 2012 at 9:37
  • Good on you for profiling your code before optimising, it seems to be too common to do it the other way around unfortunately... Commented Jan 21, 2012 at 10:40
  • if arr is never changed then just calculated this function once and save the sum, then just return sum/pCluster->size Commented Dec 21, 2013 at 4:27

5 Answers 5

1

I'm assuming that you change the arr matrix frequently, else you could just compute the sum (see Lucian's answer) once and remember it.

You can use a similar approach when you modify the matrix. Instead of completely re-computing the sum after the matrix has (likely) been changed, you can store a 'sum' value somewhere, and have every piece of code that updates the matrix update the stored sum appropriately. For instance, assuming you start with an array of all zeros:

double arr[10000][10000];
< initialize it to all zeros >
double sum = 0;

// you want set arr[27][53] to 82853
sum -= arr[27][53];
arr[27][53] = 82853;
sum += arr[27][53];

// you want set arr[27][53] to 473
sum -= arr[27][53];
arr[27][53] = 473;
sum += arr[27][53];

You might want to completely re-calculate the sum from time to time to avoid accumulation of errors.

Sign up to request clarification or add additional context in comments.

Comments

1

If you're sure that you have no option for algorithmic optimization, you'll have to rely on very low level optimizations to speed up your code. These are very platform/compiler specific so your mileage may vary.

It is probable that, at some point, the bottleneck of the operation is pulling the values of arr from the memory. So make sure that your data is laid out in a linear cache friendly way. That is to say that &arr[i][j+1] - &arr[i][j] == sizeof(double).

You may also try to unroll your inner loop, in case your compiler does not already do it. Your code :

    for (j = 0; j < 10000; j++)
    {
        perElemSimilarity += arr[i][j];
    }

Would for example become :

    for (j = 0; j < 10000; j+=10)
    {
        perElemSimilarity += arr[i][j+0];
        perElemSimilarity += arr[i][j+1];
        perElemSimilarity += arr[i][j+2];
        perElemSimilarity += arr[i][j+3];
        perElemSimilarity += arr[i][j+4];
        perElemSimilarity += arr[i][j+5];
        perElemSimilarity += arr[i][j+6];
        perElemSimilarity += arr[i][j+7];
        perElemSimilarity += arr[i][j+8];
        perElemSimilarity += arr[i][j+9];
    }

These are the basic ideas, difficult to say more without knowing your platform, compiler, looking at the generated assembly code.

You might want to take a look at this presentation for more complete examples of optimization opportunities.

If you need even more performance, you could take a look at SIMD intrinsics for your platform, of try to use, say OpenMP, to distribute your computation on multiple threads.


Another step would be to try with OpenMP, something along the following (untested) :

#pragma omp parallel for private(perElemSimilarity) reduction(+:sum)
for (i = 0; i < 10000; i++)
{
    perElemSimilarity = 0;
    /* INSERT INNER LOOP HERE */
    perElemSimilarity /= pCluster->size;
    sum += perElemSimilarity;
}

But note that even if you bring this portion of code to 0% (which is impossible) of your execution time, your GA algorithm will still take hours to run. Your performance bottleneck is elsewhere now that this portion of code takes 'only' 22% of your running time.

4 Comments

loop unrolling is probably something to leave for the compiler's optimizer, it's a recognizable construct and easily optimized.
@Rotoglup, I did UNROLLING as you have suggested. It reduces the running time of nested for loop from 70 % to 22%. Thanks first off, for your suggestion. But still, since it takes 22% of the execution time, my GA code takes hours together to converge. Could you/anyone else help me further optimize this portion of code?
@Annamalai What is your compiler ? OS ? CPU ? Did you turn on compile-time optimization flags for your compiler ?
I am using Ubuntu linux and Mac os lion in parallel. Using gcc compiler. In Mac os, the compiler version is gcc4.2.1 and on ubuntu gcc 4.6.1. I have turned-on the O2 compilation flag.
0

I might be wrong here, but isn't the following equivalent:

for (i = 0; i < 10000; i++)
{
    for (j = 0; j < 10000; j++)
    {
        sum += arr[i][j];
    }
}
return (sum / ( pCluster->size * pCluster->size ) );

5 Comments

Summing so many elements in one variable can be unsafe due to overflow. But it depends on values stored in arr
@psur ah, you're right. But maybe the values can also be negative or really small. Maybe the op can give us some more details... But other than that, it's the same, right?
Summing two elements can be unsafe for overflow. It's a question of order, which is only determinable with the scale of the input data; which I don't think is given.
Yes, Luchian Grigore, you are right. The arr contains decimal values ranging from 0 to 1.4. Seemingly, there is no chance of overflow here (as the values are mostly close to zero).
I checked the change suggested by Luchian Grigore, still the time spent in the nested for loop id 70.63 %. Just, adding some more information here, the for loop containing j runs for 21.59 % of time and the line sum += arr[i][j] runs for 46.94 % of the time.
0
  1. The register keyword is an optimizer hint, if the optimizer doesn't think the register is well spent there, it won't be.
  2. Is the matrix well packed, i.e. is it a contiguous block of memory?
  3. Is 'j' the minor index (i.e. are you going from one element to the next in memory), or are you jumping from one element to that plus 1000?
  4. Is arr fairly static? Is this called more than once on the same arr? The result of the inner loop only depends on the row/column that j traverses, so calculating it lazily and storing it for future reference will make a big difference

2 Comments

2. Yes, the array is a continuous block of memory. 3.j increments in a discontinuous fashion. 4. I am not able to understand your fourth comment, kindly, could you elaborate on this?
Don't answer the question with more questions.
0

The way this problem is stated, there isn't much you can do. You are processing 10,000 x 10,000 double input values, that's 800 MB. Whatever you do is limited by the time it takes to read 800 MB of data.

On the other hand, are you also writing 10,000 x 10,000 values each time this is called? If not, you could for example store the sums for each row and have a boolean row indicating that a row sum needs to be calculated, which is set each time you change a row element. Or you could even update the sum for a row each time an array element is change.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.