Hi,
I’ve encountered some problems with an OpenMP offloading program that uses shared memory.
According to OMP111, the data of globalized variables will be optimized into shared memory.
Using the command below to compile the following code sample:
clang++ -O3 -fopenmp -fopenmp-targets=nvptx64 --offload-arch=sm_89 test.c -o test
#pragma omp target data map(to: A[0:N][0:M], B[0:N][0:M]) map(tofrom: C[0:N][0:N])
{
#pragma omp target teams distribute
for (int i = 0; i < n; i += TILE_SIZE)
for (int j = 0; j <= n - 1; j += TILE_SIZE)
for (int k = 0; k <= m - 1; k += TILE_SIZE) {
// use of shared memory
float sA1[TILE_SIZE][TILE_SIZE];
float sA2[TILE_SIZE][TILE_SIZE];
float sB1[TILE_SIZE][TILE_SIZE];
float sB2[TILE_SIZE][TILE_SIZE];
// other memory copy and computation
...
}
}
If I set TILE_SIZE to 32, the program works fine, but if TILE_SIZE is set to 64, the program crashes. So I suspect this might be a shared memory optimization issue.
According to this, I also tried:
env LIBOMPTARGET_SHARED_MEMORY_SIZE=256000 ./test
but the environment variable seems to have no effect and the program still crashes.
I’m using LLVM 19.1.6 with the following settings:
cmake -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DCMAKE_C_COMPILER="clang" \
-DCMAKE_CXX_COMPILER="clang++" \
-DLLVM_ENABLE_PROJECTS="clang" \
-DLLVM_ENABLE_RUNTIMES="openmp;offload" \
-DLLVM_TARGETS_TO_BUILD="X86;NVPTX" \
-DLLVM_PARALLEL_COMPILE_JOBS=16 \
-DLLVM_BUILD_LLVM_DYLIB=ON \
-DLLVM_LINK_LLVM_DYLIB=ON \
-DOMPTARGET_DEBUG=ON \
-DOPENMP_ENABLE_LIBOMP_PROFILING=ON \
../llvm
Is this a shared memory allocation problem or could it be caused by something else?