Most knows that template meta programming is in general faster than virtual dispatch in C++ due to types of templates were decided in compile time while virtual functions required runtime lookup on vtable. In runtime context, what are the scenarios when virtual dispatch function is faster than compiled template function?
-
2I would not expect virtual dispatch to ever be faster, though it would not surprise me at all if they are sometimes equal. The reasons for choosing virtual dispatch are for when compile-time dispatch simply isn't enough because you don't know the object type.SoronelHaetir– SoronelHaetir2024-11-24 05:23:40 +00:00Commented Nov 24, 2024 at 5:23
-
Question doesn't make sense. Instantiation of a template happens at compile time, so there is zero runtime overhead (memory or computation time) for deciding which function to call. Whereas, virtual function dispatch has a non-zero runtime cost in order to decide which function to call (e.g. accessing and dereferencing a vtable entry). (In both cases, there is a cost of passing arguments, executing the function itself, etc). Build time (e.g. to compile and link the program) may be larger with template functions, but such costs are not normally considered run time costs.Peter– Peter2024-11-24 06:46:04 +00:00Commented Nov 24, 2024 at 6:46
-
For 99% (or more) of my program, in general the overhead of virtual dispatch is somewhere between negligible to unmeasurable. For the less-than 1% of the code that is performance critical path, the only way to know what is necessary to optimize for performance is through profiling optimized code. Of that code, eliminating virtual dispatch is the least of the worries — usually it's a non-concern because the performant operations are moved to the GPU.Eljay– Eljay2024-11-24 13:26:58 +00:00Commented Nov 24, 2024 at 13:26
2 Answers
Templates are almost always faster than virtual functions, because virtual dispatch has indirection, while templated code can have no indirection. but you can construct a case where the opposite is true and virtual functions would be faster.
On architectures based on Von Neumann (all PCs and smartphones) Instructions are also Data, that needs to be loaded from the RAM to the CPU instruction caches to be executed, and badly written templated code that instantiated too many unnecessary templates can produce a lot of instructions.
If the code cannot fit into caches, and the memory bus was tied down loading actual data, the templated code execution will be slow, because the computer will be running into memory bandwidth issues. while virtual dispatch (type-erasure) produces less code, so you end up with less memory bandwidth usage and therefore faster code.
In those cases there are possible optimizations to alleviate this issue, like splitting templated classes to a non-templated base and a templated derived type, or try to increase the compiler's ICF (identical code folding).
there is a talk that recently demonstrated this effect Save Time, Space & a Little Sanity With std::function_ref - David Ledger, the benchmarks are shown at the end where function_ref is faster than templated code. virtual dispatch is slightly slower than function_ref but it can have the same scaling behavior because they are both type-erasure.
and you need thousands of template instantiations to have any notable difference.
For normal code, you don't need to worry about either effect. Use whatever fits your requirements. templates for compile time polymorphism and virtual functions for runtime polymorphism. but beware of the disadvantages of both, and have considerable limits to the use of them. Don't use templated code where you don't need to, and don't use virtual functions where you don't need to, the problem is not using templates, the problem is instantiating too many unnecessary templates, also all applications can work well with a few virtual functions, but its performance will surely plumet if all functions are virtual. (staring at Java).
Comments
Long story short - dynamic dispatch is never faster than a static dispatch, because the latter doesn't involve any runtime overhead. I.e. by the time your program starts, it already knows all addresses of statically dispatched functions and it doesn't need to calculate it when calling.
One of C++ mottos is "You don't pay for what you don't use" and dynamic dispatch is actually the price you pay when you need to support a virtual function call in your class.