0

I have a functor used by thrust, where I need to specify its length dynamically , like

struct func { 

       const int h;

       func(const int _h): h(_h) {}

       __device__ __host__
       void operator()(int id) {
              double data[h];
      }
};

I'm not sure how to do this, because h has to be a known number, but h is not known until run time.

5
  • Is there an expected, relative small set of values of h? Commented Dec 10, 2013 at 6:40
  • @JoeZ this part of code is on device, cannot use std::vector. just raw pointer to device memory Commented Dec 10, 2013 at 6:47
  • @talonmies h varies a lot, in my case it varies from 20 to 2000. It is the number of hidden neurons in a neural network. Commented Dec 10, 2013 at 6:48
  • Ah, ok. I'll delete my comment. Commented Dec 10, 2013 at 6:48
  • @user2684645: At first inspection, this looks like a very poor design choice then. Each thread will have to statically or dynamically allocate a copy of data. 16kb of local or heap memory per thread will really affect performance. Is this really what you want to do? Commented Dec 10, 2013 at 7:17

1 Answer 1

2

The obvious way to solve this is use dynamic memory allocation, so the functor becomes

   __device__ __host__
   void operator()(int id) {
        double *data  = new double[h];

        // functor code goes here

        // Heap memory has context scope, so delete is necessary to stop leaks
        delete[] data; 
   };

This will work on GPUs of compute capability of 2.0 or newer. The downside is that memory allocation will be on the runtime heap in global memoey, which limits compiler optimisations, and the new/free operators themselves are very slow, so having this happen for each thread in the kernel launch will cost a lot of performance.

An alternative, if the value range of h is limited, consider replacing h within the operator code with a template parameter and then just use a selector instead for the known cases, so something like

   template<int j>
   __device__ __host__
   void guts(int id) {
       double data[j];
       // code here
   };

   __device__ __host__
   void guts_rt(int id) {
       double *data = new double[h];
       // code here
       delete[] data;
   };

   __device__ __host__
   void operator()(int id) {
       switch (h) {
           case 2:
           guts<2>(id);
           break;

           case 4:
           guts<4>(id);
           break;

           // As many as needed here

           default:
           guts_rt(id);
           break;
      }
  }

ie. try and use hard coded arrays where possible (which the compiler can optimize for), and fall back to a dynamic solution otherwise (and if your GPU actually supports dynamic allocation of heap memory anyway).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.