1

I am attempting to create an array of pointers on the host. Each pointer in the array points to an array of size 4. When I try to copy a pointer to the device, the copy fails and the device cannot access the contents of the array to which the pointer points to. How would I copy a pointer from an array of pointers that points to an array from host to device?

__global__ void kernel(int* D)
{
    int tid = threadIdx.x + blockIdx.x * blockDim.x;
    while (tid < 4)
    {
        printf("Device = %d\n", D[tid]);
        tid += blockDim.x * gridDim.x;
    }
}

int main(void)
{
    cudaProfilerStart();

    int* H[2];
    int* D[2]; 
    int test1[4] = { 1, 2, 3, 4 };
    int test2[4] = { 10, 20, 30, 40 };

    H[0] = test1;
    H[1] = test2;

    HANDLE_ERROR(cudaMalloc((void**)&D[0], 4 * sizeof(int)));
    HANDLE_ERROR(cudaMemcpy(D[0], H[0], 4 * sizeof(int), cudaMemcpyHostToDevice));
    kernel <<<1, 4 >>>(D[0]);

    cudaProfilerStop();

    return 0;
}
1
  • 1
    There is nothing wrong with the code you have posted and it works as expected for me. Are you certain you have a functional CUDA installation? What runtime errors does cuda-memcheck report if you run your code with that? Commented Mar 29, 2017 at 15:44

1 Answer 1

1

As talonmies pointed out there is nothing wrong with the Code. However, you will not see the prints in your kernel, the reason being that the kernel call is asynchronous and your process ends before the kernel prints can be executed. A synchronization call will solve this problem here. However, in real code this might not be needed.

#include <iostream>
#include <numeric>
#include <stdlib.h>
#include <stdio.h>



__global__ void kernel(int* D)
{
        int tid = threadIdx.x + blockIdx.x * blockDim.x;
        while (tid < 4)
        {
                printf("Device = %d\n", D[tid]);
                tid += blockDim.x * gridDim.x;
        }
}

int main(void)
{
        // cudaProfilerStart();

        int* H[2];
        int* D[2];
        int test1[4] = { 1, 2, 3, 4 };
        int test2[4] = { 10, 20, 30, 40 };

        H[0] = test1;
        H[1] = test2;

        cudaMalloc((void**)&D[0], 4 * sizeof(int));
        cudaMemcpy(D[0], H[0], 4 * sizeof(int), cudaMemcpyHostToDevice);
        kernel <<<1, 1 >>>(D[0]);

        cudaError_t cudaerr1 = cudaDeviceSynchronize();
        if (cudaerr1 != cudaSuccess)
                printf("kernel launch failed with error \"%s\".\n",
                        cudaGetErrorString(cudaerr1));

         //cudaProfilerStop();

        return 0;
}
Sign up to request clarification or add additional context in comments.

2 Comments

Would this also cause the cuda profiler to not show a timeline?
Yes, the nvprof profiler gave me this error "Warning: Found 1 invalid records in the result. Warning: This can happen if device ran out of memory or if a device kernel was stopped due to an assertion. "

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.