How to access device 2D array global variable from host

Question

I want to print d_t global 2D array variable using "printf" inside main method. But I got a compile warning saying that:

a __device__ variable "d_t" cannot be directly read in a host function

How can I copy global 2D array variable from device to host and then print the first column of each row?

__device__ double *d_t;

__device__ size_t d_gridPitch;


__global__ void kernelFunc()
{
    int i = blockIdx.x * blockDim.x + threadIdx.x;

    double* rowt = (double*)((char *)d_t + i * d_gridPitch);
    rowt[0] = rowt[0] + 40000;

}


int main()
{
    int size = 16;
    size_t d_pitchLoc;
    double *d_tLoc;

    cudaMallocPitch((void**)&d_tLoc, &d_pitchLoc, size * sizeof(double), size);
    cudaMemset2D(d_tLoc, d_pitchLoc, 0, size * sizeof(double), size);

    cudaMemcpyToSymbol(d_gridPitch, &d_pitchLoc, sizeof(int));
    cudaMemcpyToSymbol(d_t, & d_tLoc, sizeof(d_tLoc));

    kernelFunc<<<1,size>>>();

    for(int i=0; i< size; i++){
        double* rowt = (double*)((char *)d_t + i * d_gridPitch);
        printf("%.0f, ",rowt[0]);
    }

    cudaDeviceReset();

    return 0;
}

The cudaMemcpy2D function is used to copy to or from a pitched allocation (i.e. created with cudaMallocPitch). Here is the API documentation for cudaMemcpy2D. If you search on this CUDA tag you will find many questions and answers that demonstrate proper usage, such as this one. Use proper CUDA error checking. — Robert Crovella
– Robert Crovella, Commented Mar 26, 2016 at 0:30

talonmies · Accepted Answer · 2016-03-26 08:25:02Z

As indicated in comments, the cudaMemcpy2D API is designed for exactly this task. You must allocate or statically define a host memory buffer or container to act as storage for the data from the device, and then provide the pitch of that host buffer to the cudaMemcpy2D call. The API handles the pitch conversion without any further intervention on the caller side.

If you replace the print loop with something like this:

double* h_t = new double[size * size];
cudaMemcpy2D(h_t, size * sizeof(double), d_tLoc, d_pitchLoc, 
        size * sizeof(double), size, cudaMemcpyDeviceToHost);
for(int i=0, j=0; i< size; i++){
    std::cout << h_t[i * size + j] << std::endl; 
}

[Note I'm using iostream here for the printing. CUDA uses a C++ compiler for compiling host code and you should prefer iostream functions over cstdio because they are less error prone and support improve diagnostics on most platforms].

You can see that the API call form is very similar to the cudaMemset2D call that I provided for you in your last question.

Collectives™ on Stack Overflow

How to access device 2D array global variable from host

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related