GPU Coder cannot parallelize loop

Question

Jeffrey 2025 年 2 月 22 日

1
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2174383-gpu-coder-cannot-parallelize-loop

移動済み: Walter Roberson 2025 年 2 月 25 日

I have a for-loop that I am trying to parallelize with GPU Coder, which looks like this

% n_out is of type uint64
% input_array is of type single array
function out = my_func(n_out, input_array) %# codegen
    coder.gpu.kernelfun;
    out = zeros(1, n_out, 'single');
    for i = 1:n_out % loop I want to parallelize
        temp = 0.0;
        %%
        % code that changes temp depending on input_array(i). There are no reads from or writes to
        % variable 'out' here
        %%
        out(i) = temp; % GPU Coder says this is a loop carried dependency?
    end
end

When I run GPU Coder, it does not create a kernel and the build report states:

"Unable to parallelize loop because of loop carried dependencies. Check the use of variable 'out' in function 'my_func'".

1) Why is the assignment

out(i) = temp;

a "loop carried dependency"?

2) How do I remove such a "loop carried dependency"?

EDIT: removed syntax error in for loop index declaration

2 件のコメント
なしを表示なしを非表示

Walter Roberson 2025 年 2 月 22 日

I would be curious about what would happen if you wrote into a temporary array, and eventually copied the temporary array to the output variable?

I also wonder whether there are cases where out(i) is not assigned to, leading to a dependancy on the initialization of zeros()

Chao Luo 2025 年 2 月 24 日

MATLAB Online で開く

Hi Jeffrey,

Thanks for posting the question. There is a syntax error at line 4,

for i:n_out

I guess you mean

for i = 1:n_out

After fixing it, I am able to see the loop get parallelized when n_out type is a double scalar.

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Jeffrey 2025 年 2 月 24 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2174383-gpu-coder-cannot-parallelize-loop#answer_1560602

移動済み: Walter Roberson 2025 年 2 月 25 日

The syntax error was a copying issue of mine. I've edited the question to reflect my code.

Also, thanks! It was the 'n_out is a type double scalar' that did it. I was using 'n_out' as 'uint64'. forcing a 'double' caused GPU Coder to parallelize the loop.

Do you know why parallelization cares about whether 'n_out' is double or integer?

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Chao Luo 2025 年 2 月 25 日

移動済み: Walter Roberson 2025 年 2 月 25 日

It is a limitation of the analysis. When a uint64 is used as array index, it is casted into int32. The cast would prevent the analysis to parallelize the loop. double is a special case because it is the default type commonly used as array index, so it is automatic replaced with int32 type so no cast is needed. So, if you have to use integer type, you can use int32 as index type.

サインインしてコメントする。

GPU Coder cannot parallelize loop

2 件のコメント
なしを表示なしを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

GPU Coder cannot parallelize loop

2 件のコメント なしを表示なしを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

2 件のコメント
なしを表示なしを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示