C++ template meta-programming member function loop unrolling

Question

I have just started to use template meta-programming in my code. I have a class which has as a member which is a vector of a multi-dimensional Cartesian points. Here is a basic setup of the class:

template<size_t N>
class TGrid{
public:
     void round_points_3(){
         for(std::size_t i = 0; i < Xp.size();i++){
             Xp[i][0] = min[0] + (std::floor((Xp[i][0] - min[0]) * nbins[0] / (max[0] - min[0])) * bin_w[0]) + bin_w[0]/2.0;
             Xp[i][1] = min[1] + (std::floor((Xp[i][1] - min[1]) * nbins[1] / (max[1] - min[1])) * bin_w[1]) + bin_w[1]/2.0;
             Xp[i][2] = min[2] + (std::floor((Xp[i][2] - min[2]) * nbins[2] / (max[2] - min[2])) * bin_w[2]) + bin_w[2]/2.0;
         }
    }
    void round_points_2(){
         for(std::size_t i = 0; i < Xp.size();i++){
             Xp[i][0] = min[0] + (std::floor((Xp[i][0] - min[0]) * nbins[0] / (max[0] - min[0])) * bin_w[0]) + bin_w[0]/2.0;
             Xp[i][1] = min[1] + (std::floor((Xp[i][1] - min[1]) * nbins[1] / (max[1] - min[1])) * bin_w[1]) + bin_w[1]/2.0;
         }
    }
    void round_points_1(){
         for(std::size_t i = 0; i < Xp.size();i++){
             Xp[i][0] = min[0] + (std::floor((Xp[i][0] - min[0]) * nbins[0] / (max[0] - min[0])) * bin_w[0]) + bin_w[0]/2.0;
         }
    }
public:

std::vector<std::array<double,N> > Xp;
std::vector<double> min, max, nbins, bin_w;
};

This class represented a multidimensional Grid. The dimension is specified by the template value N. I will be having many operations which can be made more efficient by having template specific member functions tailored to the specific dimensions, such as loop unrolling.

In the class TGrid, I have 3 functions specific for dimensions D=1,D=2 and D=3. This is indicated by the subscript _1,_2 and _3 of the functions.

I am looking for a template meta-programming oriented approach to write these three functions more compactly.

I have seen examples of loop unrolling but all of these examples don't consider member functions of a template class.

Don't do it. Use a generic round_points and a second loop. array::size() is constexpr and the compiler will then decide by itself if unrolling the loop is a good idea. — pmr
– pmr, Commented Jan 26, 2015 at 9:59
"I will be having many operations which can be made more efficient by having template specific member functions tailored to the specific dimensions, such as loop unrolling". oh yeah. MEASURE. — Cheers and hth. - Alf
– Cheers and hth. - Alf, Commented Jan 26, 2015 at 9:59
I have measured the time for other operations (marginalisation ect..) and these little things tend to do have an impact. The reason being I am dealing with up to 4 Gigabyte grids. — Guillaume
– Guillaume, Commented Jan 26, 2015 at 10:07

halfflat · Accepted Answer · 2015-02-25 21:08:30Z

2

Putting to one side the question of whether or not this is an appropriate optimisation, or if other optimisations should be regarded first, this is how I would do it. (But I do agree, sometimes it is demonstrably better to explicitly unroll loops — the compiler isn't always the best judge.)

One can't partially specialize a member function, and one can't specialize a nested struct without specializing the outer struct, so the only solution is to use a separate templated struct for the unrolling mechanism. Feel free to put this in some other namespace :)

The unrolling implementation:

template <int N>
struct sequence {
    template <typename F,typename... Args>
    static void run(F&& f,Args&&... args) {
        sequence<N-1>::run(std::forward<F>(f),std::forward<Args>(args)...);
        f(args...,N-1);
    }
};

template <>
struct sequence<0> {
    template <typename F,typename... Args>
    static void run(F&& f,Args&&... args) {}
};

This takes an arbitrary functional object and a list of arguments, and then calls the object with the arguments and an additional final argument N times, where the final argument ranges from 0 to N-1. The universal references and variadic templates are not necessary; the same idea can be employed in C++98 with less generality.

round_points<K> then calls sequence::run<K> with a helper static member function:

template <size_t N>
class TGrid {
public:
    template <size_t K>
    void round_points(){
        for (std::size_t i = 0; i < Xp.size();i++) {
            sequence<K>::run(TGrid<N>::round_item,*this,i);
        }
    }

    static void round_item(TGrid &G,int i,int j) {
        G.Xp[i][j] = G.min[j] + (std::floor((G.Xp[i][j] - G.min[j]) * G.nbins[j] / (G.max[j] - G.min[j])) * G.bin_w[j]) + G.bin_w[j]/2.0;
    }

    // ...
};

Edit: Addendum

Doing the equivalent with a pointer-to-member function appears to be hard for compilers to inline. As an alternative, to avoid the use of a static round_item, you can use a lambda, e.g.:

template <size_t N>
class TGrid {
public:
    template <size_t K>
    void round_points(){
        for (std::size_t i = 0; i < Xp.size();i++) {
            sequence<K>::run([&](int j) {round_item(i,j);});
        }
    }

    void round_item(int i,int j) {
        Xp[i][j] = min[j] + (std::floor((Xp[i][j] - min[j]) * nbins[j] / (max[j] - min[j])) * bin_w[j]) + bin_w[j]/2.0;
    }

    // ...
};

edited Feb 25, 2015 at 21:08

answered Feb 14, 2015 at 19:44

halfflat

1,5948 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Guillaume Over a year ago

Thanks for your answer I will try this out. Thank you also for answering the question without going into if it is an appropriate optimisation or not.

Guillaume Over a year ago

Why do you have round item as static ?

halfflat Over a year ago

Two reasons: firstly, it's clearer code without using pointer-to-member function objects or std::mem_fn; but secondly, this makes it much easier on the compiler's inliner. An alternative is to keep round_item as non-static, but pass a lambda to sequence<K>::run(). I'll edit the answer to suit.

Collectives™ on Stack Overflow

C++ template meta-programming member function loop unrolling

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related