Copy array of structures from host to device in CUDA

Question

I am trying to copy an array of structures from host to device in CUDA. For example:

#define N  1000;
#define M 100000;

typedef struct {
     int i;
     float L[N];    
}t ; 

__global__ void kernel() {
  //do something
}

main () {
   t *B, *B_d;   // Pointer to host & device arrays of structure
   int size = M * sizeof(t);

   B=(t*)calloc(M,sizeof(t));
   cudaMalloc((void **) &B_d, size);   // Allocate array of structure on device
  // readind B from file ...
  cudaMemcpy(B_d, B, size, cudaMemcpyHostToDevice);
  kernel<<<1, 1 >>>();

}

Is that right or not? And how can I use Kernel function?

Eugene · Accepted Answer · 2012-08-30 16:59:33Z

1

Now you can declare your kernel as accepting a parameter of type (t *) and pass your B to the kernel call.

Some comments: 1. Using only 1 thread in the kernel call is very ineffective. For optimal results, you need to have multiples of 32 threads in the block. 2. Having array of structures will not allow your code effectively use memory bandwidth. For optimal results, you need to make coalesced reads.

answered Aug 30, 2012 at 16:59

Eugene

9,5542 gold badges32 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user1285050 Over a year ago

1-can I pass B_d to the kernel of type (t*) rather than B? 2-and how can I make coalesced reads?

Eugene Over a year ago

You should pass pointer to the GPU memory (that's B-d). To perform coalesced memory access for optimal GPU memory performance, you should switch from the array-of-structures to the structure that is comprised of arrays (e.g. that has its fields as primitive arrays). Note that such structure members should be properly aligned.

Collectives™ on Stack Overflow

Copy array of structures from host to device in CUDA

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related