2

I wrote a programm that multiplies a vector by a matrix. The matrix has periodically repeated cells, so I use a temporary variable to sum vector elements before multiplication. The period is the same for adjacent rows. I create a separate temp variable for each thread. sizeof(InnerVector) == 400 and I don't want to allocate memory for it on every iterration (= 600 times).

Code looks something like this:

tempsSize = omp_get_max_threads();
InnerVector temps = new InnerVector[tempsSize];

for(int k = 0; k < tempsSize; k++)
    InnerVector_init(temps[k]);

for(int jmin = 1, jmax = 2; jmax < matrixSize/2; jmin *= 2, jmax *= 2)
{
    int period = getPeriod(jmax);

    #pragma omp parallel
    {
        int threadNum = omp_get_thread_num();
        // printf("\n threadNum = %i", threadNum);

        #pragma omp for
        for(int j = jmin; j < jmax; j++)
        {
            InnerVector_reset(temps[threadNum]);   
            for(int i = jmin; i < jmax; i++)
            {
                InnerMatrix cell = getCell(i, j);
                if(temps[threadNum].IsZero)
                    for(int k = j; k < matrixSize; k += period)
                        InnerVector_add(temps[threadNum], temps[threadNum], v[k]);
                InnerVector_add_mul(v_res[i], cell, temps[threadNum]);
            }
        }
    }
}

The code looks to be correct but I get wrong result. In fact, I get different results for different runs... sometimes result is correct.

When I compile in debug mode the result is always correct. When I uncomment the row with "printf" the result is always correct.

p.s. I use Visual Studio 2010.

2
  • 1
    what is v[]? Is it thread safe? Commented May 1, 2011 at 4:18
  • InnerVector* v; // array of type "InnerVector" which I need to multiply. Commented May 1, 2011 at 15:28

1 Answer 1

3

I suspect there might be a data race in
InnerVector_add_mul(v_res[i], cell, temps[threadNum]);

Since v_res appears to be a resulting vector, and i changes from jmin to jmax in each iteration of the parallelized loop, it can happen that multiple threads write to v_res[i] for the same value of i, with unpredictable result.

Sign up to request clarification or add additional context in comments.

2 Comments

To get it correct I should switch "for j" and "for i". But this way I'll lost "calculate temp one time for several i" optimization...
@Shamil: If this answer is correct, please accept it by clicking the check-mark under the vote counter.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.