0

i think an array can be allocated on gpu ex. __device__ int device_array[100]; without using cudaMalloc as the lenght is known. But when i run the following code some irrelevant numbers are displayed. I examined a popular book for cuda and all examples in there uses cudaMalloc. A fixed size array can be used like this or it must be allocated with cudaMalloc?

__device__ int device_array[100];

__global__ void kernel() {

    device_array[blockIdx.x] = blockIdx.x;
}

void call_kernel( int *host_array ) {

    kernel<<<100,1>>>();

    cudaMemcpy( host_array, device_array, 100 * sizeof( int ), cudaMemcpyDeviceToHost );
}

int main() {

    int host_array[100];

    call_kernel( host_array );

    for ( int i = 0; i < 100; i++ )
        cout << host_array[i] << endl;
}
2
  • Your code has no error checking. It is probable that the cudaMemcpy call is failing, but you just don't know it because you are not checking the return status. Once you confirm that an error is occurring at runtime, the source of the problem will become apparent. Commented Mar 28, 2013 at 19:28
  • More clues. How to do error checking is nicely discussed here. Commented Mar 28, 2013 at 19:32

1 Answer 1

1

As Robert alluded to in his comment, you have to use cudaMemcpyFromSymbol when accessing a __device__ symbol on the host. Thus your cudaMemcpy call in its present form should be giving an error along the lines of "invalid argument". If you want to see this, try changing your cudaMemcpy line to:

cudaError_t cuda_status = cudaMemcpy(...); 
std::cout << cudaGetErrorString(cuda_status) << std::endl;

Anyways, if you want to get the right answer, you should change your cudaMemcpy line to be:

cudaMemcpyFromSymbol( host_array, device_array, 100 * sizeof( int ), 0, cudaMemcpyDeviceToHost);

The signature for cudaMemcpyFromSymbol is:

cudaError_t cudaMemcpyFromSymbol ( void* dst, const void* symbol, size_t count, size_t offset = 0, cudaMemcpyKind kind = cudaMemcpyDeviceToHost )

The offset defaults to 0 and the memory copy direction defaults to cudaMemcpyDeviceToHost, so those are technically optional in your case. The main takeaway from all this is to always check your cuda-call's return values, as they generally lead you in the right direction.

Sign up to request clarification or add additional context in comments.

1 Comment

If you want to copy from the Host to Device in the same manner, then you might be better served by the function "cudaMemcpyToSymbol". I'd advise taking a look at the available CUDA runtime API functions over at docs.nvidia.com/cuda/cuda-runtime-api/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.