3

What is the best approach (efficiently) to initialize a large array of integers for the gpu? I need to assign 1 for first two elements and 0 for other (for Sieve of Eratosthenes).

  1. cudaMemcpy
  2. cudaMemset + set value of 2 first elements in kernel
  3. initialization direct in kernel
  4. sth else

Note: Array size is dynamic (n is passed as an argument).

My current version:

int array = (int*) malloc(array_size);
array[0] = 1;
array[1] = 1;
for (int i = 2; i < n; i++) {
    array[i] = 0;
}
HANDLE_ERROR(cudaMemcpy(dev_array, array, array_size, cudaMemcpyHostToDevice));
kernel<<<10, 10>>>(dev_array);

I would be grateful for an example.

5
  • Have you considered calloc (which zero initializes) instead of malloc and then just the first 2 elements to 1 (like you do) ? Commented Jun 24, 2015 at 18:42
  • 1
    @Blue Moon, thx for idea, but calloc operation is not available for CUDA programs. Commented Jun 24, 2015 at 18:48
  • 2
    Better to use cudaMemset instead of loop. Commented Jun 24, 2015 at 19:11
  • @haccks, ok but what about the first two elements? Initialize it in single thread? Commented Jun 24, 2015 at 20:26
  • 1
    For first two elements: array[0] = array[1] = 0;. and then cudaMemset(&array[2], 0, array_size * sizeof(int));. Commented Jun 24, 2015 at 21:50

1 Answer 1

6

One possibility is to directly initialize __device__ array on GPU if it has constant size by adding following declaration at file scope (that is, outside of any function):

__device__ int dev_array[SIZE] = {1, 1};

The remaining elements will be initiliazed with zeros (you can check PTX assembly to be sure of that).

then, it can be used in kernel like:

__global__ void kernel(void)
{
    int tid = ...;
    int elem = dev_array[tid];
    ...
}

In case of variable size, you can combine cudaMalloc() with cudaMemset():

int array_size = ...;
int *dev_array;

cudaMalloc((void **) &dev_array, array_size * sizeof(int));
cudaMemset(dev_array, 0, array_size * sizeof(int));

then set first two elements as ones:

int helper_array[2] = {1, 1};
cudaMemcpy(dev_array, helper_array, 2 * sizeof(int), cudaMemcpyHostToDevice);

Beginning with compute capability 2.0 you can also allocate whole array directly within kernel by the malloc() device function:

__global__ void kernel(int array_size)
{
    int *dev_array;
    int tid = ...;

    if (tid == 0) {
        dev_array = (int *) malloc(array_size * sizeof(int));
        if (dev_array == NULL) {
            ...
        }
        memset(dev_array, 0, array_size * sizeof(int));
        dev_array[0] = dev_array[1] = 1;  
    }
    __syncthreads();

    ...
}

Note that threads from different blocks are unaware of barrier synchronization.

From the CUDA C Programming Guide:

The CUDA in-kernel malloc() function allocates at least size bytes from the device heap and returns a pointer to the allocated memory or NULL if insufficient memory exists to fulfill the request. The returned pointer is guaranteed to be aligned to a 16-byte boundary.

Unfortunatelly, the calloc() function is not implemented, hence you need to memset it anyway. Allocated memory has lifetime of CUDA context, but you can explicitely call free() from this or subsequent kernel at any time:

The memory allocated by a given CUDA thread via malloc() remains allocated for the lifetime of the CUDA context, or until it is explicitly released by a call to free(). It can be used by any other CUDA threads even from subsequent kernel launches.

With all that said, I wouldn't mind that much about supplementary cudaMemcpy(), since it's just two elements to copy and it would likely take less than 0.01% of the total execution time (it's easy to profile). Choose whatever way that makes you code clear. Otherwise it's a premature optimization.

Sign up to request clarification or add additional context in comments.

2 Comments

thx nice idea, but I hasn't constant size. I'm sorry that i didn't write this before.
@Bakus123: Edited again with some note.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.