When using GPU Coder, how should I try to minimize how often data is transferred between CPU AND GPU?

조회 수: 2 (최근 30일)
I am using GPU Coder and am concerned about CPU/GPU data transfer affecting performance. Suppose I have two MATLAB functions with the 'coder.gpu.kernelfun' pragma at the top of each, and I do something with the data between calling them:
A = half(data);
B = kernelfun1(A); % output is B
% do something with B here
C = kernelfun2(B); % input is B
Does the data remain on the GPU the whole time as a half-precision float, or does it get copied to the CPU during the "do something with B" part?

채택된 답변

MathWorks Support Team
MathWorks Support Team 2025년 1월 25일
편집: MathWorks Support Team 2025년 1월 31일
GPU Coder tries to minimize copies between CPU and GPU. CPU/GPU copies purely depend on data access patterns.
To access the relevant documentation, execute the following command in the MATLAB R2020b command window:
>> web(fullfile(docroot, 'gpucoder/ug/gpu-memory-allocation-and-minimization.html'))
If you generate code for kernelfun1 and kernelfun2 separately (i.e., you call 'codegen' twice) and then try to call the generated mex functions like kernelfun1(b) .* kernelfun2(c), and kernelfun1 or kernelfun2 attempt to return a 'half' data type, a transfer to the CPU will occur to perform the multiplication. This is a current limitation of MATLAB because 'gpuArray' does not support the half data type. However, if you perform the multiplication in a wrapper function, e.g.:
function a = kernelfun3(b,c) coder.gpu.kernelfun; a = kernelfun1(b) .* kernelfun2(c); end
and only call 'codegen' on func3, then GPU Coder will generate code such that the multiplication is performed on the GPU. 
The limitation above does not apply if the returned data type of kernelfun1 and kernelfun2 is 'single' or some other datatype supported by gpuArray. In that case, the following multiplication will be performed on the GPU:
kernelfun1(b) .* kernelfun2(c)
CPU Coder tries to fuse the kernels as much as possible, so with the example above, you may find that the generated code contains a single GPU kernel instead of three separate ones for kernelfun1, kernelfun2, and kernelfun3. The effectiveness of this optimization depends on program structure and dataflow. However, we have noticed this optimization not happening in some cases. We recommend trying out code generation on your design and examining the generated code to see whether the coder performed this optimization. If not, you can try altering your design to achieve the desired results.
Please use the below link to search for the required information in the current release:

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Kernel Creation from MATLAB Code에 대해 자세히 알아보기

제품


릴리스

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by