When is virtual dispatch faster than function templates in C++ runtime?

Question

Most knows that template meta programming is in general faster than virtual dispatch in C++ due to types of templates were decided in compile time while virtual functions required runtime lookup on vtable. In runtime context, what are the scenarios when virtual dispatch function is faster than compiled template function?

I would not expect virtual dispatch to ever be faster, though it would not surprise me at all if they are sometimes equal. The reasons for choosing virtual dispatch are for when compile-time dispatch simply isn't enough because you don't know the object type. — SoronelHaetir
– SoronelHaetir, Commented Nov 24, 2024 at 5:23
Question doesn't make sense. Instantiation of a template happens at compile time, so there is zero runtime overhead (memory or computation time) for deciding which function to call. Whereas, virtual function dispatch has a non-zero runtime cost in order to decide which function to call (e.g. accessing and dereferencing a vtable entry). (In both cases, there is a cost of passing arguments, executing the function itself, etc). Build time (e.g. to compile and link the program) may be larger with template functions, but such costs are not normally considered run time costs. — Peter
– Peter, Commented Nov 24, 2024 at 6:46
For 99% (or more) of my program, in general the overhead of virtual dispatch is somewhere between negligible to unmeasurable. For the less-than 1% of the code that is performance critical path, the only way to know what is necessary to optimize for performance is through profiling optimized code. Of that code, eliminating virtual dispatch is the least of the worries — usually it's a non-concern because the performant operations are moved to the GPU. — Eljay
– Eljay, Commented Nov 24, 2024 at 13:26

Ahmed AEK · Accepted Answer · 2024-11-24 15:13:13Z

Templates are almost always faster than virtual functions, because virtual dispatch has indirection, while templated code can have no indirection. but you can construct a case where the opposite is true and virtual functions would be faster.

On architectures based on Von Neumann (all PCs and smartphones) Instructions are also Data, that needs to be loaded from the RAM to the CPU instruction caches to be executed, and badly written templated code that instantiated too many unnecessary templates can produce a lot of instructions.

If the code cannot fit into caches, and the memory bus was tied down loading actual data, the templated code execution will be slow, because the computer will be running into memory bandwidth issues. while virtual dispatch (type-erasure) produces less code, so you end up with less memory bandwidth usage and therefore faster code.

In those cases there are possible optimizations to alleviate this issue, like splitting templated classes to a non-templated base and a templated derived type, or try to increase the compiler's ICF (identical code folding).

there is a talk that recently demonstrated this effect Save Time, Space & a Little Sanity With std::function_ref - David Ledger, the benchmarks are shown at the end where function_ref is faster than templated code. virtual dispatch is slightly slower than function_ref but it can have the same scaling behavior because they are both type-erasure. and you need thousands of template instantiations to have any notable difference.

For normal code, you don't need to worry about either effect. Use whatever fits your requirements. templates for compile time polymorphism and virtual functions for runtime polymorphism. but beware of the disadvantages of both, and have considerable limits to the use of them. Don't use templated code where you don't need to, and don't use virtual functions where you don't need to, the problem is not using templates, the problem is instantiating too many unnecessary templates, also all applications can work well with a few virtual functions, but its performance will surely plumet if all functions are virtual. (staring at Java).

Aleksandr Medvedev · Accepted Answer · 2024-11-24 06:51:18Z

0

Long story short - dynamic dispatch is never faster than a static dispatch, because the latter doesn't involve any runtime overhead. I.e. by the time your program starts, it already knows all addresses of statically dispatched functions and it doesn't need to calculate it when calling.

One of C++ mottos is "You don't pay for what you don't use" and dynamic dispatch is actually the price you pay when you need to support a virtual function call in your class.

answered Nov 24, 2024 at 6:51

Aleksandr Medvedev

10.3k3 gold badges25 silver badges66 bronze badges

1 Comment

Pepijn Kramer Nov 24, 2024 at 8:47

In practice there is very little between them. The Hidden Performance Price of C++ Virtual Functions - Ivica Bogosavljevic - CppCon 2022. More likely than not performace will be found in different places of the code (e.g. algorithmically and or reducing the number of missed branch predictions and or cache misses). In the end there is only ONE solution : Measure (using a profiler) and measure on different kinds of hardware.

Collectives™ on Stack Overflow

When is virtual dispatch faster than function templates in C++ runtime?

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related