I use cupy to call cuda kernels, but I don't know how to debug cuda code, here is my wrapper file:
wrapper.py
import math
from pathlib import Path
import cupy as cp
import numpy as np
with open(Path(__file__).parents[1] / 'cuda' / 'lcx_projector_kernels.cu', 'r', encoding='utf-8') as f:
lines = f.read()
compute_systemG_kernel = cp.RawKernel(lines, 'compute_systemG_kernel')
def compute_systemG_lm(xstart,
xend,
img_origin,
voxsize,
sysG,
nLORs,
img_dim,
tofbin_width,
sigma_tof,
tofcenter_offset,
nsigmas,
tofbin,
threadsperblock):
compute_systemG_kernel(
(math.ceil(nLORs / threadsperblock), ), (threadsperblock, ),
(xstart.ravel(), xend.ravel(),
cp.asarray(img_origin), cp.asarray(voxsize), sysG,
np.int64(nLORs), cp.asarray(img_dim),
np.float32(tofbin_width), cp.asarray(sigma_tof).ravel(),
cp.asarray(tofcenter_offset).ravel(), np.float32(nsigmas),
tofbin)
)
There are lots of ways to debug cuda code with lib file on Internet, but cupy do not use lib file instead of delivering parameters to cuda kernels. Does anybody know how to debug cuda in my situation?
I have try in this way, but breakpoing only stop in python, not in cuda.
launch.json
{
"version": "0.2.0",
"configurations": [
{
"name": "train.py",
"type": "debugpy",
"request": "launch",
"program": "/home/fanghaodu/code/LMPDnet/train.py",
"console": "integratedTerminal",
"justMyCode": false,
"env": {
"PYTHONPATH": "${workspaceFolder}"
}
},
{
"name": "CUDA GDB",
"type": "cuda-gdb",
"request": "launch",
"program": "/home/fanghaodu/.conda/envs/cu128/bin/python", // which python
"args": ["${file}"],
"debuggerPath": "/opt/apps/cuda-12.8/bin/cuda-gdb", // which cuda-gdb
},
],
"compounds": [
{
"name": "Python and CUDA",
"configurations": ["train.py", "CUDA GDB"]
}
],
}
I've heard that cuda-gdb can debug cuda file but it needs to build lib file first and should use pybind11, I don't want a troublesome approach. Are there any other convenient ways?