Why is calling lambda function faster than calling a regular function

Question

#include <iostream>
#include <ctime>

#define TIME(t) {std::cout << ((double)(clock() - (t)) / CLOCKS_PER_SEC);}

volatile long int limit = 10000000000;

void l2(int& a) {a++;}

void f(int& a)
{
  auto l1 = [&a]()
  {
    a++;
  };

  clock_t clk = clock();
  for(int i=0;i<limit;i++)
  {
    l1();
  }
  TIME(clk) // 4.07 s

  a=5;
  clk = clock();
  for(int i=0;i<limit;i++)
  {
    l2(a);
  }
  TIME(clk) // 4.32 s
}

int main()
{
  int a = 5;
  f(a);
  return 0;
}

Why is calling a lambda function faster?

Using gcc 4.8 with O3

They're not equivalent. One of them passes a parameter, the other doesn't. — user541686
– user541686, Commented Oct 7, 2015 at 19:53
Both require a reference of 'a' and both increment it. What is the inherit differences? — Kam
– Kam, Commented Oct 7, 2015 at 19:55
One is a parameter that is passed on every call, the other is a field that is only constructed once..? — user541686
– user541686, Commented Oct 7, 2015 at 19:56

Yakk - Adam Nevraumont · Accepted Answer · 2015-10-07 20:03:26Z

Lambda loop disassembly: (using godbolt gcc 4.8.2 -O3 in C++11 mode)

movq    limit(%rip), %rax
testq   %rax, %rax
jle .L7
movl    (%rbx), %eax
movl    $1, %edx
.L8:
movq    limit(%rip), %rcx
movq    %rdx, %rsi
leal    (%rax,%rdx), %edi
addq    $1, %rdx
cmpq    %rcx, %rsi
jl  .L8
movl    %edi, (%rbx)

Function call loop disassembly:

movq    limit(%rip), %rax
testq   %rax, %rax
jle .L5
movl    (%rbx), %eax
movl    $1, %edx
.L10:
movq    limit(%rip), %rcx
movq    %rdx, %rsi
leal    (%rax,%rdx), %edi
addq    $1, %rdx
cmpq    %rcx, %rsi
jl  .L10
movl    %edi, (%rbx)

The two loops compile down to identical code.

Any difference is due to the order you did the operation, or random chance.

In general, lambdas are easier to inline, because the operation of () is defined by the type of the variable, not the value. And propagating values and using them to optimize is a touch harder than doing the same with types.

The classic example is using qsort vs std::sort.

user541686 · Accepted Answer · 2015-10-07 19:58:35Z

1

Conceptually, the function can work on other variables, but the lambda only works on a.
Clearly, the more flexibility you have, the more you can expect to pay for it, as in this case.
Here, what is happening is that passing a parameter to the function on every call is more expensive than not doing so, and the compiler hasn't been able to optimize this away, hence the difference.

answered Oct 7, 2015 at 19:58

user541686

213k133 gold badges562 silver badges935 bronze badges

2 Comments

Eran Over a year ago

I'd be surprised if both lambda and function calls weren't inlined, and I doubt the compiler will have any trouble optimizing this simple code. Since @Kam only says one is faster than the other, we don't know how big is the difference. Who knows, it might be insignificant and thus meaningless...

user541686 Over a year ago

@eran: It doesn't have much to do with inlining (I assumed both are inlined), but rather with loop invariant code motion. The point is that the same variable is being passed to the function multiple times in one case, but not the other. The compiler has to be pretty smart in order to move out the loop invariant, and while it's not impossible by any means, it's not surprising to me that some compilers might not do this.

Collectives™ on Stack Overflow

Why is calling lambda function faster than calling a regular function

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related