cuda fixed size global array

Question

i think an array can be allocated on gpu ex. __device__ int device_array[100]; without using cudaMalloc as the lenght is known. But when i run the following code some irrelevant numbers are displayed. I examined a popular book for cuda and all examples in there uses cudaMalloc. A fixed size array can be used like this or it must be allocated with cudaMalloc?

__device__ int device_array[100];

__global__ void kernel() {

    device_array[blockIdx.x] = blockIdx.x;
}

void call_kernel( int *host_array ) {

    kernel<<<100,1>>>();

    cudaMemcpy( host_array, device_array, 100 * sizeof( int ), cudaMemcpyDeviceToHost );
}

int main() {

    int host_array[100];

    call_kernel( host_array );

    for ( int i = 0; i < 100; i++ )
        cout << host_array[i] << endl;
}

Your code has no error checking. It is probable that the cudaMemcpy call is failing, but you just don't know it because you are not checking the return status. Once you confirm that an error is occurring at runtime, the source of the problem will become apparent. — talonmies
– talonmies, Commented Mar 28, 2013 at 19:28
More clues. How to do error checking is nicely discussed here. — Robert Crovella
– Robert Crovella, Commented Mar 28, 2013 at 19:32

alrikai · Accepted Answer · 2013-03-28 19:41:50Z

1

As Robert alluded to in his comment, you have to use cudaMemcpyFromSymbol when accessing a __device__ symbol on the host. Thus your cudaMemcpy call in its present form should be giving an error along the lines of "invalid argument". If you want to see this, try changing your cudaMemcpy line to:

cudaError_t cuda_status = cudaMemcpy(...); 
std::cout << cudaGetErrorString(cuda_status) << std::endl;

Anyways, if you want to get the right answer, you should change your cudaMemcpy line to be:

cudaMemcpyFromSymbol( host_array, device_array, 100 * sizeof( int ), 0, cudaMemcpyDeviceToHost);

The signature for cudaMemcpyFromSymbol is:

cudaError_t cudaMemcpyFromSymbol ( void* dst, const void* symbol, size_t count, size_t offset = 0, cudaMemcpyKind kind = cudaMemcpyDeviceToHost )

The offset defaults to 0 and the memory copy direction defaults to cudaMemcpyDeviceToHost, so those are technically optional in your case. The main takeaway from all this is to always check your cuda-call's return values, as they generally lead you in the right direction.

answered Mar 28, 2013 at 19:41

alrikai

4,1943 gold badges26 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

alrikai Over a year ago

If you want to copy from the Host to Device in the same manner, then you might be better served by the function "cudaMemcpyToSymbol". I'd advise taking a look at the available CUDA runtime API functions over at docs.nvidia.com/cuda/cuda-runtime-api/…

Collectives™ on Stack Overflow

cuda fixed size global array

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related