in my C OpenCL code I use clSetKernelArg to create 'variable size' __local memory for use in my kernels, which is not available in OpenCL per se. See my example:
clSetKernelArg(clKernel, ArgCounter++, sizeof(cl_mem), (void *)&d_B);
...
clSetKernelArg(clKernel, ArgCounter++, sizeof(float)*block_size*block_size, NULL);
...
kernel="
matrixMul(__global float* C,
...
__local float* A_temp,
...
)"
{...
My question is now, how to do the same in pyopencl?
I looked through the examples that come with pyopencl, but the only thing I could find was an approach using templates, which seems as to me as I understood it like an overkill. See example.
kernel = """
__kernel void matrixMul(__global float* C,...){
...
__local float A_temp[ %(mem_size) ];
...
}
What do you recommend?