3

Is there a way to convert a 2D vector into an array to be able to use it in CUDA kernels?

It is declared as:

vector<vector<int>> information;

I want to cudaMalloc and copy from host to device, what would be the best way to do it?

int *d_information;
cudaMalloc((void**)&d_information, sizeof(int)*size);
cudaMemcpy(d_information, information, sizeof(int)*size, cudaMemcpyHostToDevice);

2 Answers 2

4

In a word, no there isn't. The CUDA API doesn't support deep copying and also doesn't know anything about std::vector either. If you insist on having a vector of vectors as a host source, it will require doing something like this:

int *d_information;
cudaMalloc((void**)&d_information, sizeof(int)*size);

int *dst = d_information;
for (std::vector<std::vector<int> >::iterator it = information.begin() ; it != information.end(); ++it) {
    int *src = &((*it)[0]);
    size_t sz = it->size();
    
    cudaMemcpy(dst, src, sizeof(int)*sz, cudaMemcpyHostToDevice);
    dst += sz;
}

[disclaimer: written in browser, not compiled or tested. Use at own risk]

This would copy the host memory to an allocation in GPU linear memory, requiring one copy for each vector. If the vector of vectors is a "jagged" array, you will want to store an indexing somewhere for the GPU to use as well.

Sign up to request clarification or add additional context in comments.

4 Comments

Ok, so no way having std::vector in CUDA. Could it be helpful (and easy) to use Thrust? I've never used it before but as far as I read it's somehow similar to the STL library but in CUDA. Any advise?
No, thrust doesn't have any support for this either. You would be much better off just flattening your host array into a std::vector<int> and indexing it as you would linear memory on the device.
@BRabbit27: I don't know why your edit was rejected, it was correct. I dashed off that code in the browser and when I put that disclaimer in, I really mean it.
yeah I analyzed what you proposed, I found the mistake and try it on my code and well it worked. I knew the disclaimer was there because you meant it. Anyways, I corrected it use in case someone is interested in something like that but still the disclaimer should be taken into account.
2

As far as I understand, the vector of vectors do not need to reside in a contiguous memory, i.e. they can be fragmented.

Depending on the amount of memory you need to transfer I would do one of two issues:

  1. Reorder your memory to be a single vector, and then use your cudaMemcpy.
  2. Create a series of cudaMemcpyAsync, where each copy handles a single vector in your vector of vectors, and then synchronize.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.