3

I have just started to use template meta-programming in my code. I have a class which has as a member which is a vector of a multi-dimensional Cartesian points. Here is a basic setup of the class:

template<size_t N>
class TGrid{
public:
     void round_points_3(){
         for(std::size_t i = 0; i < Xp.size();i++){
             Xp[i][0] = min[0] + (std::floor((Xp[i][0] - min[0]) * nbins[0] / (max[0] - min[0])) * bin_w[0]) + bin_w[0]/2.0;
             Xp[i][1] = min[1] + (std::floor((Xp[i][1] - min[1]) * nbins[1] / (max[1] - min[1])) * bin_w[1]) + bin_w[1]/2.0;
             Xp[i][2] = min[2] + (std::floor((Xp[i][2] - min[2]) * nbins[2] / (max[2] - min[2])) * bin_w[2]) + bin_w[2]/2.0;
         }
    }
    void round_points_2(){
         for(std::size_t i = 0; i < Xp.size();i++){
             Xp[i][0] = min[0] + (std::floor((Xp[i][0] - min[0]) * nbins[0] / (max[0] - min[0])) * bin_w[0]) + bin_w[0]/2.0;
             Xp[i][1] = min[1] + (std::floor((Xp[i][1] - min[1]) * nbins[1] / (max[1] - min[1])) * bin_w[1]) + bin_w[1]/2.0;
         }
    }
    void round_points_1(){
         for(std::size_t i = 0; i < Xp.size();i++){
             Xp[i][0] = min[0] + (std::floor((Xp[i][0] - min[0]) * nbins[0] / (max[0] - min[0])) * bin_w[0]) + bin_w[0]/2.0;
         }
    }
public:

std::vector<std::array<double,N> > Xp;
std::vector<double> min, max, nbins, bin_w;
};

This class represented a multidimensional Grid. The dimension is specified by the template value N. I will be having many operations which can be made more efficient by having template specific member functions tailored to the specific dimensions, such as loop unrolling.

In the class TGrid, I have 3 functions specific for dimensions D=1,D=2 and D=3. This is indicated by the subscript _1,_2 and _3 of the functions.

I am looking for a template meta-programming oriented approach to write these three functions more compactly.

I have seen examples of loop unrolling but all of these examples don't consider member functions of a template class.

3
  • 1
    Don't do it. Use a generic round_points and a second loop. array::size() is constexpr and the compiler will then decide by itself if unrolling the loop is a good idea. Commented Jan 26, 2015 at 9:59
  • 3
    "I will be having many operations which can be made more efficient by having template specific member functions tailored to the specific dimensions, such as loop unrolling". oh yeah. MEASURE. Commented Jan 26, 2015 at 9:59
  • I have measured the time for other operations (marginalisation ect..) and these little things tend to do have an impact. The reason being I am dealing with up to 4 Gigabyte grids. Commented Jan 26, 2015 at 10:07

1 Answer 1

2

Putting to one side the question of whether or not this is an appropriate optimisation, or if other optimisations should be regarded first, this is how I would do it. (But I do agree, sometimes it is demonstrably better to explicitly unroll loops — the compiler isn't always the best judge.)

One can't partially specialize a member function, and one can't specialize a nested struct without specializing the outer struct, so the only solution is to use a separate templated struct for the unrolling mechanism. Feel free to put this in some other namespace :)

The unrolling implementation:

template <int N>
struct sequence {
    template <typename F,typename... Args>
    static void run(F&& f,Args&&... args) {
        sequence<N-1>::run(std::forward<F>(f),std::forward<Args>(args)...);
        f(args...,N-1);
    }
};

template <>
struct sequence<0> {
    template <typename F,typename... Args>
    static void run(F&& f,Args&&... args) {}
};

This takes an arbitrary functional object and a list of arguments, and then calls the object with the arguments and an additional final argument N times, where the final argument ranges from 0 to N-1. The universal references and variadic templates are not necessary; the same idea can be employed in C++98 with less generality.

round_points<K> then calls sequence::run<K> with a helper static member function:

template <size_t N>
class TGrid {
public:
    template <size_t K>
    void round_points(){
        for (std::size_t i = 0; i < Xp.size();i++) {
            sequence<K>::run(TGrid<N>::round_item,*this,i);
        }
    }

    static void round_item(TGrid &G,int i,int j) {
        G.Xp[i][j] = G.min[j] + (std::floor((G.Xp[i][j] - G.min[j]) * G.nbins[j] / (G.max[j] - G.min[j])) * G.bin_w[j]) + G.bin_w[j]/2.0;
    }

    // ...
};

Edit: Addendum

Doing the equivalent with a pointer-to-member function appears to be hard for compilers to inline. As an alternative, to avoid the use of a static round_item, you can use a lambda, e.g.:

template <size_t N>
class TGrid {
public:
    template <size_t K>
    void round_points(){
        for (std::size_t i = 0; i < Xp.size();i++) {
            sequence<K>::run([&](int j) {round_item(i,j);});
        }
    }

    void round_item(int i,int j) {
        Xp[i][j] = min[j] + (std::floor((Xp[i][j] - min[j]) * nbins[j] / (max[j] - min[j])) * bin_w[j]) + bin_w[j]/2.0;
    }

    // ...
};
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your answer I will try this out. Thank you also for answering the question without going into if it is an appropriate optimisation or not.
Why do you have round item as static ?
Two reasons: firstly, it's clearer code without using pointer-to-member function objects or std::mem_fn; but secondly, this makes it much easier on the compiler's inliner. An alternative is to keep round_item as non-static, but pass a lambda to sequence<K>::run(). I'll edit the answer to suit.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.