0

I just want to pass device function as argument of a host function, of cause, the host function then can launch some kernels with this device side function.

I tried the usual C++ way (pass by pointer/reference) and the CUDA debugger told me the kernel cannot launch.

Update:

What I want to do is:

__host__ void hostfunction(int a, int (*DeviceFunction)(int))
{
   /...do something.../
   somekernel<<<blocks, threads>>>(int * in, DeviceFunction);
}

And launch the host with:

hostfunction(x, &SomeDeviceFunctionTemplate<int>);
2
  • 1
    Your question is not entirely clear, at least to me. If you could post an example of what you tried, where the kernel did not launch, that might help. Commented Aug 28, 2013 at 13:57
  • 1
    Since __host__ functions can't take the addresses of __device__ functions, you basically need to write a short __global__ function that takes the address of the __device__ function of interest, and then stores it to memory. Your __host__ function can then read that function pointer from memory and then pass it to somekernel. Commented Aug 29, 2013 at 1:08

2 Answers 2

2

This example might be of interest:

$ cat t237.cu
#include <stdio.h>


__device__ int f1(){ printf("dev f1\n"); return 0;}
__device__ int f2(){ printf("dev f2\n"); return 0;}
__device__ int f3(){ printf("dev f3\n"); return 0;}

__device__ int *fptrf1 = (int *)f1;
__device__ int *fptrf2 = (int *)f2;
__device__ int *fptrf3 = (int *)f3;


__global__ void mykernel(int (*fptr)()){

  fptr();
  printf("executed\n");
}

int main(){

  int *hf1, *hf2, *hf3;
  cudaMemcpyFromSymbol(&hf1, fptrf1, sizeof(int *));
  cudaMemcpyFromSymbol(&hf2, fptrf2, sizeof(int *));
  cudaMemcpyFromSymbol(&hf3, fptrf3, sizeof(int *));
  mykernel<<<1,1>>>((int (*)())hf1);
  cudaDeviceSynchronize();
  mykernel<<<1,1>>>((int (*)())hf2);
  cudaDeviceSynchronize();
  mykernel<<<1,1>>>((int (*)())hf3);
  cudaDeviceSynchronize();
  return 0;
}
$ nvcc -arch=sm_20 -O3 -o t237 t237.cu
$ ./t237
dev f1
executed
dev f2
executed
dev f3
executed
[bob@cluster1 misc]$

I think this is roughly along the lines of what Jared was suggesting. As he mentioned, this will not be possible in host code:

&SomeDeviceFunctionTemplate<int>

Assuming SomeDeviceFunctionTemplate refers to a __device__ function.

Sign up to request clarification or add additional context in comments.

3 Comments

Can I adapt this somehow for templated code instead the code you have in main()? Or is there no way around explicitly defining the device-side pointer for each and every function I plan to use?
some examples linked here use templating.
So, this example replaces the need for a global pointer variable with the need for a global setup-kernel which copies the device function's pointer to a location of the launcher's choosing. That doesn't help much... reading on. There's the question about kernel addresses, which are usable. But how would I leverage that to let me get at the address of arbitrary device functions?
1

It'd be helpful if you could post an example of what you are trying to do, but one thing to check is that you are compiling and running on Fermi (sm_20) or later since older GPUs did not support non-inlined function calls.

Check the compute capability of your device (needs 2.0 or later) and check your nvcc command line (needs -arch=sm_20 or later, or the -gencode equivalent).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.