0

My kernel needs a list/array of Configuration as an input parameter. I even have a list/array of such lists/arrays, one of them is to pass to the kernel. These Configuration are prepared on the host and do not change. So this would be a perfect use for constant memory. But honestly, I do not really get how to do it.

I try to give my idea in the code draft below. Basically, I see two ways how to define/pass the lists:

  • Define them as arrays with fixed lenghts and pass them by-value to the kernel
  • Define them as pointers and just pass a pointer to the kernel (must be copied to device first, of course)

Which method should I take and how should I modify the code below to make sure, constant memory is used?

I expect each list to have typically a size less than 200-300 Bytes. If I would make all lists of the same size, I would maybe go for a size of 512 Bytes or 1 kB.

class Configuration{
  // some constants
}

// We need a list of lists Configurations, these could be implemented either as...
Configuration a[10][100]; // fixed-length array or...
Configuration ** b; // as a dynamic array to pointers of arrays

// Parameter will take an array of Configuration, either as a pointer or directly as an array
__global__ kernel(Configuration * config){

}

// According to the above example, we use the pointer-version. Could also be a call directly using a[i]
kernel<<...>>(b[i], lengthOfB[i]);

1 Answer 1

1

If you want the data to be in __constant__ memory (which may not be a smart move, depending on how you access the data in the kernel), then the first approach (fixed length array) is the only sensible one. Also for simplicity I would flatten the two-dimensional array to a one-dimensional array, for ease of use/copying.

In addition to being read-only, __constant__ memory is intended to be accessed for efficiency such that each thread in a warp is requesting the same value. Your question didn't mention this, so you may want to refer to this question/answer for explanation/examples.

If you went with the pointer approach, only the pointer would be in constant memory (presumably), and so that is not what you want (presumably).

If you use __constant__ memory, there is no need to also pass that pointer as a kernel parameter. The data declaration has global scope.

Something like this might work:

class Configuration{
  // some constants
  int cdata;
}

__constant__ Configuration const_data[10*100];

// ***setup in host code
Configuration h_data[10*100];
// fill in h_data ...
// then copy to device
cudaMemcpyToSymbol(const_data, h_data, sizeof(h_data));
// ***

//use in kernel code
__global__ void mykernel(){

  int my_data = const_data[5].cdata;

}

Note that in total, __constant__ memory is limited to 64K bytes.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for this answer. Do I need to define it globally constant? Is there no other way to use constant cache?
Yes, __constant__ symbols must be defined in translation unit scope.
If you have a cc3.x or higher device, you may also want to see about using the "read-only" cache or __ldg instrinsic. Depending on your actual access patterns, it may have substantially higher throughput than __constant__ memory. And it can be used directly with an ordinary global pointer passed as a kernel parameter. You'll want to be sure to carefully use __restrict__ and const to decorate your pointers, starting with the kernel parameter itself.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.