i've been looking for a way to transfer a filled array of arrays from host to device in CUDA.
What i have:
- A global array of arrays that is filled with data, that i need to copy to the device for kernel execution.
- The arrays in the array have different lengths.
I have a function to initiate the array and it's values:
double** weights; // globally defined in host
int init_weigths(){
weights = (double**) malloc(sizeof(double*) * SIZE);
for (int i = 0; i < SIZE; i++) {
weights[i] = (double*) malloc(sizeof(double) * getSize(i));
for (int j = 0; j < getSize(i); j++){
weights[i][j] = get_value(i,j);
}
}
}
My (not working) solution:
I've designed a solution gathering information of other answers found in the Internet, but no one worked. I think it's because of the difference that my array of arrays is already filled up with information, and of the variable lengths of the contained arrays.
The solution i have, that is throwing "invalid argument" error in all cudaMemcpy calls, and in the second and further cudaMalloc calls; checked by cudaGetLastError().
The solution is this one:
double** d_weights;
int init_cuda_weight(){
cudaMalloc((void **) &d_weights, sizeof(double*) * SIZE);
double** temp_d_ptrs = (double**) malloc(sizeof(double*) * SIZE);
// temp array of device pointers
for (int i = 0; i < SIZE; i++){
cudaMalloc((void**) &temp_d_ptrs[getSize(i)],
sizeof(double) * getSize(i));
// ERROR CHECK WITH cudaGetLastError(); doesn't throw any errors ar first.
cudaMemcpy(temp_d_ptrs[getSize(i)], weights[getSize(i)], sizeof(double) * getSize(i), cudaMemcpyHostToDevice);
// ERROR CHECK WITH cudaGetLastError(); throw "invalid argument" error for now and beyond.
}
cudaMemcpy(d_weights, temp_d_ptrs, sizeof(double*) * SIZE,
cudaMemcpyHostToDevice);
}
As aditional information, i've simplified the code a bit. The arrays contained in the array of arrays have different lengths (i.e. SIZE2 isn't constant), thats why i'm not flattening to an 1D array.
What is wrong with this implementation? Any ideas to achieve the copy?
Edit2: The original code i wrote was OK. I edited the code to include the error i had and included the correct answer (code) below.
SIZE2you used to allocatetemp_d_ptrs[i]and theSIZE2you use in the offendingcudaMemcpyoperation. But it's impossible to say based on what you have shown.