0

i've been looking for a way to transfer a filled array of arrays from host to device in CUDA.

What i have:

  • A global array of arrays that is filled with data, that i need to copy to the device for kernel execution.
  • The arrays in the array have different lengths.

I have a function to initiate the array and it's values:

double** weights; // globally defined in host
int init_weigths(){
    weights = (double**) malloc(sizeof(double*) * SIZE);

    for (int i = 0; i < SIZE; i++) {
        weights[i] = (double*) malloc(sizeof(double) * getSize(i));

        for (int j = 0; j < getSize(i); j++){
            weights[i][j] = get_value(i,j);
        }
    }
}

My (not working) solution:

I've designed a solution gathering information of other answers found in the Internet, but no one worked. I think it's because of the difference that my array of arrays is already filled up with information, and of the variable lengths of the contained arrays.

The solution i have, that is throwing "invalid argument" error in all cudaMemcpy calls, and in the second and further cudaMalloc calls; checked by cudaGetLastError(). The solution is this one:

double** d_weights;
int init_cuda_weight(){
    cudaMalloc((void **) &d_weights, sizeof(double*) * SIZE);

    double** temp_d_ptrs = (double**) malloc(sizeof(double*) * SIZE);
    // temp array of device pointers
    for (int i = 0; i < SIZE; i++){
        cudaMalloc((void**) &temp_d_ptrs[getSize(i)],
                sizeof(double) * getSize(i));
        // ERROR CHECK WITH cudaGetLastError(); doesn't throw any errors ar first.
        cudaMemcpy(temp_d_ptrs[getSize(i)], weights[getSize(i)], sizeof(double) * getSize(i), cudaMemcpyHostToDevice);
        // ERROR CHECK WITH cudaGetLastError(); throw "invalid argument" error for now and beyond.
    }

   cudaMemcpy(d_weights, temp_d_ptrs, sizeof(double*) * SIZE,
        cudaMemcpyHostToDevice);
}

As aditional information, i've simplified the code a bit. The arrays contained in the array of arrays have different lengths (i.e. SIZE2 isn't constant), thats why i'm not flattening to an 1D array.

What is wrong with this implementation? Any ideas to achieve the copy?

Edit2: The original code i wrote was OK. I edited the code to include the error i had and included the correct answer (code) below.

5
  • 1
    There is nothing wrong with the code you have shown. You'll need to provide a minimal reproducible example which is expected for questions like this. Here is a fully worked example, using your code verbatim, and it runs without throwing any runtime errors. Based on what you have shown, the only thing I could speculate is that you may have a mismatch between the SIZE2 you used to allocate temp_d_ptrs[i] and the SIZE2 you use in the offending cudaMemcpy operation. But it's impossible to say based on what you have shown. Commented Oct 4, 2017 at 0:43
  • Thank you for the information! I really appreciate making better examples for questions like this; thank you for not directly down-voting too. I replicated the example without the "for" cycles and the memory allocation and copy worked fine as i wanted. I'm trying now to figure out what was the problem. I'm not sure what should i do with the question now i figured out the example i provided was OK. Commented Oct 4, 2017 at 0:48
  • In the future, if you want to avoid down-votes, my advice is to recognize that this question (or questions like it) requires a MCVE per SO help page, and therefore you should not even post such a question here without one. Then at least no one can downvote for obviously violating that rule. You have obviously (in my opinion) violated that rule. And I don't think it's out of bounds to call it a rule, since it is spelled out on the SO help page and the word must is used. Commented Oct 4, 2017 at 0:54
  • @RobertCrovella Thank you for the clarification, i'm new in SO. As right now i don't have enough time to make a MCVE (and because my real code is very complex). You recommend me to delete the question to avoid down-votes? As my reputation has gone down in my last questions and the system warned me that they may block me for making further questions. But i think the code may help other people. Commented Oct 4, 2017 at 0:59
  • I edited the example to include the real problem. And answered the solution with the correct code we stated. Thanks for the help. Commented Oct 4, 2017 at 1:20

1 Answer 1

2

The mistake is that i used the array total size getSize(i) as the index of the allocations and copies. It was a naive error hidden by the complexity and verbosity of the real code.

The correct solution is:

double** d_weights;
int init_cuda_weight(){
    cudaMalloc((void **) &d_weights, sizeof(double*) * SIZE);

    double** temp_d_ptrs = (double**) malloc(sizeof(double*) * SIZE);
    // temp array of device pointers
    for (int i = 0; i < SIZE; i++){
        cudaMalloc((void**) &temp_d_ptrs[i],
                sizeof(double) * getSize(i));
        // ERROR CHECK WITH cudaGetLastError()
        cudaMemcpy(temp_d_ptrs[i], weights[i], sizeof(double) * getSize(i), cudaMemcpyHostToDevice);
        // ERROR CHECK WITH cudaGetLastError()
    }

   cudaMemcpy(d_weights, temp_d_ptrs, sizeof(double*) * SIZE,
        cudaMemcpyHostToDevice);
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.