0

I am trying to copy an array of structures from host to device in CUDA. For example:

#define N  1000;
#define M 100000;

typedef struct {
     int i;
     float L[N];    
}t ; 

__global__ void kernel() {
  //do something
}

main () {
   t *B, *B_d;   // Pointer to host & device arrays of structure
   int size = M * sizeof(t);

   B=(t*)calloc(M,sizeof(t));
   cudaMalloc((void **) &B_d, size);   // Allocate array of structure on device
  // readind B from file ...
  cudaMemcpy(B_d, B, size, cudaMemcpyHostToDevice);
  kernel<<<1, 1 >>>();

}

Is that right or not? And how can I use Kernel function?

1 Answer 1

1

Now you can declare your kernel as accepting a parameter of type (t *) and pass your B to the kernel call.

Some comments: 1. Using only 1 thread in the kernel call is very ineffective. For optimal results, you need to have multiples of 32 threads in the block. 2. Having array of structures will not allow your code effectively use memory bandwidth. For optimal results, you need to make coalesced reads.

Sign up to request clarification or add additional context in comments.

2 Comments

1-can I pass B_d to the kernel of type (t*) rather than B? 2-and how can I make coalesced reads?
You should pass pointer to the GPU memory (that's B-d). To perform coalesced memory access for optimal GPU memory performance, you should switch from the array-of-structures to the structure that is comprised of arrays (e.g. that has its fields as primitive arrays). Note that such structure members should be properly aligned.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.