I have the following code which is basically a forward substitution of a lower triangular matrix.
for (int i = 0; i < matrix.get_rowptr()->size() - 1; ++i)
{
double sum = 0.0;
#pragma omp parallel for reduction(+:sum)
for (int j = matrix.get_rowptr()->operator[](i); j < matrix.get_diagonal_index()->operator[](i); ++j)
{
sum += matrix.get_value()->operator[](j) * result[matrix.get_columnindex()->operator[](j)];
}
result[i] = sum;
result[i] = vector1[i] - result[i];
}
The first loop goes over the rows and the second one over the columns. The average number of operations in the inner loop is minium 100.
I tried to use OpenMP to parallize the inner loop by simply adding
#pragma omp parallel for
But the wall time increased. Is there a way to parallize this peace of code in good way?
Thanks in advance. Best regards.
get_rowptrandget_columnindexdefined?