Efficiently Initializing Shared Memory Array in CUDA

Question

Note that this shared memory array is never written to, only read from.

As I have it, my shared memory gets initialized like:

__shared__ float TMshared[2592]; 
for (int i = 0; i< 2592; i++)
{
    TMshared[i] = TM[i];
}
__syncthreads();

(TM is passed into all threads from kernel launch)

You might have noticed that this is highly inefficient as there is no parallelization going on and threads within the same block are writing to the same location.

Can someone please recommend a more efficient approach/comment on if this issue really needs optimization since the shared array in question is relatively small?

Thanks!

Robert Crovella · Accepted Answer · 2014-06-25 23:00:15Z

4

Use all threads to write independent locations, it will probably be quicker.

Example assumes 1D threadblock/grid:

#define SSIZE 2592

__shared__ float TMshared[SSIZE]; 

  int lidx = threadIdx.x;
  while (lidx < SSIZE){
    TMShared[lidx] = TM[lidx];
    lidx += blockDim.x;}

__syncthreads();

answered Jun 25, 2014 at 23:00

Robert Crovella

154k12 gold badges254 silver badges300 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Jordan Over a year ago

NICE. Where exactly does the "#define SSIZE 2592" go? At the top of the cu file, outside the global kernel?

Jordan Over a year ago

Also, what's the point of using #define? Does it offer an advantage over just explicitly coding the number 2592 in the appropriate place?

Robert Crovella Over a year ago

Yes, the define normally goes at the top of the file, although I'm pretty sure you can put it anywhere (anywhere before it is used in the code). There's no explicit code or performance advantage of define vs. 2592. However, if I change the size of my shared memory array, I only have to change it in one place.

Collectives™ on Stack Overflow

Efficiently Initializing Shared Memory Array in CUDA

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related