cudamemcpy array of pointers where each pointer points to an array

Question

I am attempting to create an array of pointers on the host. Each pointer in the array points to an array of size 4. When I try to copy a pointer to the device, the copy fails and the device cannot access the contents of the array to which the pointer points to. How would I copy a pointer from an array of pointers that points to an array from host to device?

__global__ void kernel(int* D)
{
    int tid = threadIdx.x + blockIdx.x * blockDim.x;
    while (tid < 4)
    {
        printf("Device = %d\n", D[tid]);
        tid += blockDim.x * gridDim.x;
    }
}

int main(void)
{
    cudaProfilerStart();

    int* H[2];
    int* D[2]; 
    int test1[4] = { 1, 2, 3, 4 };
    int test2[4] = { 10, 20, 30, 40 };

    H[0] = test1;
    H[1] = test2;

    HANDLE_ERROR(cudaMalloc((void**)&D[0], 4 * sizeof(int)));
    HANDLE_ERROR(cudaMemcpy(D[0], H[0], 4 * sizeof(int), cudaMemcpyHostToDevice));
    kernel <<<1, 4 >>>(D[0]);

    cudaProfilerStop();

    return 0;
}

There is nothing wrong with the code you have posted and it works as expected for me. Are you certain you have a functional CUDA installation? What runtime errors does cuda-memcheck report if you run your code with that? — talonmies
– talonmies, Commented Mar 29, 2017 at 15:44

Nirvedh Meshram · Accepted Answer · 2017-03-29 16:40:46Z

1

As talonmies pointed out there is nothing wrong with the Code. However, you will not see the prints in your kernel, the reason being that the kernel call is asynchronous and your process ends before the kernel prints can be executed. A synchronization call will solve this problem here. However, in real code this might not be needed.

#include <iostream>
#include <numeric>
#include <stdlib.h>
#include <stdio.h>



__global__ void kernel(int* D)
{
        int tid = threadIdx.x + blockIdx.x * blockDim.x;
        while (tid < 4)
        {
                printf("Device = %d\n", D[tid]);
                tid += blockDim.x * gridDim.x;
        }
}

int main(void)
{
        // cudaProfilerStart();

        int* H[2];
        int* D[2];
        int test1[4] = { 1, 2, 3, 4 };
        int test2[4] = { 10, 20, 30, 40 };

        H[0] = test1;
        H[1] = test2;

        cudaMalloc((void**)&D[0], 4 * sizeof(int));
        cudaMemcpy(D[0], H[0], 4 * sizeof(int), cudaMemcpyHostToDevice);
        kernel <<<1, 1 >>>(D[0]);

        cudaError_t cudaerr1 = cudaDeviceSynchronize();
        if (cudaerr1 != cudaSuccess)
                printf("kernel launch failed with error \"%s\".\n",
                        cudaGetErrorString(cudaerr1));

         //cudaProfilerStop();

        return 0;
}

answered Mar 29, 2017 at 16:40

Nirvedh Meshram

4696 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user1431515 Over a year ago

Would this also cause the cuda profiler to not show a timeline?

Nirvedh Meshram Over a year ago

Yes, the nvprof profiler gave me this error "Warning: Found 1 invalid records in the result. Warning: This can happen if device ran out of memory or if a device kernel was stopped due to an assertion. "

Collectives™ on Stack Overflow

cudamemcpy array of pointers where each pointer points to an array

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related