Local variable vs. array access

Question

Which of these would be more computationally efficient, and why?

A) Repeated array access:

for(i=0; i<numbers.length; i++) {
    result[i] = numbers[i] * numbers[i] * numbers[i];
}

B) Setting a local variable:

for(i=0; i<numbers.length; i++) {
    int n = numbers[i];
    result[i] = n * n * n;
}

Would not the repeated array access version have to be calculated (using pointer arithmetic), making the first option slower because it is doing this?:

for(i=0; i<numbers.length; i++) {
    result[i] = *(numbers + i) * *(numbers + i) * *(numbers + i);
}

You could run the code using a Timer object? Would be interesting to see what the results are. I mention this as you already have it coded :) — Dr Schizo
– Dr Schizo, Commented Oct 8, 2013 at 9:43
I suppose that the compiler will generate the same instructions. — Michael M.
– Michael M., Commented Oct 8, 2013 at 9:50
@Michael You're saying that as if every compiler on every platform behaved the same. — svick
– svick, Commented Oct 8, 2013 at 9:52
As with all benchmark questions, the only way to reliable determine the answer is to benchmark the performance in your target environment. This greatly depends on how well you compiler optimizes, whether you target architecture has special instructions to support such operations and probably other factors. — Björn Pollex
– Björn Pollex, Commented Oct 8, 2013 at 9:55
Accessing numbers.length but also using the subscription operator makes me think that the c tag is actually inappropriate. — Frerich Raabe
– Frerich Raabe, Commented Oct 8, 2013 at 9:55

Frerich Raabe · Accepted Answer · 2013-10-08 10:02:53Z

Any sufficiently sophisticated compiler will generate the same code for all three solutions. I turned your three versions into a small C program (with a minor adjustement, I changed the access numbers.length to a macro invocation which gives the length of an array):

#include <stddef.h>

size_t i;
static const int numbers[] = { 0, 1, 2, 4, 5, 6, 7, 8, 9 };

#define ARRAYLEN(x) (sizeof((x)) / sizeof(*(x)))
static int result[ARRAYLEN(numbers)];

void versionA(void)
{
    for(i=0; i<ARRAYLEN(numbers); i++) {
        result[i] = numbers[i] * numbers[i] * numbers[i];
    }
}

void versionB(void)
{
    for(i=0; i<ARRAYLEN(numbers); i++) {
        int n = numbers[i];
        result[i] = n * n * n;
    }
}

void versionC(void)
{
    for(i=0; i<ARRAYLEN(numbers); i++) {
            result[i] = *(numbers + i) * *(numbers + i) * *(numbers + i);
    }
}

I then compiled it using optimizations (and debug symbols, for prettier disassembly) with Visual Studio 2012:

C:\Temp>cl /Zi /O2 /Wall /c so19244189.c
Microsoft (R) C/C++ Optimizing Compiler Version 17.00.50727.1 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

so19244189.c

Finally, here's the disassembly:

C:\Temp>dumpbin /disasm so19244189.obj

[..]

_versionA:
  00000000: 33 C0              xor         eax,eax
  00000002: 8B 0C 85 00 00 00  mov         ecx,dword ptr _numbers[eax*4]
            00
  00000009: 8B D1              mov         edx,ecx
  0000000B: 0F AF D1           imul        edx,ecx
  0000000E: 0F AF D1           imul        edx,ecx
  00000011: 89 14 85 00 00 00  mov         dword ptr _result[eax*4],edx
            00
  00000018: 40                 inc         eax
  00000019: 83 F8 09           cmp         eax,9
  0000001C: 72 E4              jb          00000002
  0000001E: A3 00 00 00 00     mov         dword ptr [_i],eax
  00000023: C3                 ret

_versionB:
  00000000: 33 C0              xor         eax,eax
  00000002: 8B 0C 85 00 00 00  mov         ecx,dword ptr _numbers[eax*4]
            00
  00000009: 8B D1              mov         edx,ecx
  0000000B: 0F AF D1           imul        edx,ecx
  0000000E: 0F AF D1           imul        edx,ecx
  00000011: 89 14 85 00 00 00  mov         dword ptr _result[eax*4],edx
            00
  00000018: 40                 inc         eax
  00000019: 83 F8 09           cmp         eax,9
  0000001C: 72 E4              jb          00000002
  0000001E: A3 00 00 00 00     mov         dword ptr [_i],eax
  00000023: C3                 ret

_versionC:
  00000000: 33 C0              xor         eax,eax
  00000002: 8B 0C 85 00 00 00  mov         ecx,dword ptr _numbers[eax*4]
            00
  00000009: 8B D1              mov         edx,ecx
  0000000B: 0F AF D1           imul        edx,ecx
  0000000E: 0F AF D1           imul        edx,ecx
  00000011: 89 14 85 00 00 00  mov         dword ptr _result[eax*4],edx
            00
  00000018: 40                 inc         eax
  00000019: 83 F8 09           cmp         eax,9
  0000001C: 72 E4              jb          00000002
  0000001E: A3 00 00 00 00     mov         dword ptr [_i],eax
  00000023: C3                 ret

Note how the assembly is exactly the same in all cases. So the correct answer to your question

Which of these would be more computationally efficient, and why?

for this compiler is: mu. Your question cannot be answered because it's based on incorrect assumptions. None of the answers is faster than any other.

Krzysztof Kosiński · Accepted Answer · 2013-10-08 09:59:15Z

2

The theoretical answer:

A reasonably good optimizing compiler should convert version A to version B, and perform only one load from memory. There should be no performance difference if optimization is enabled.

If optimization is disabled, version A will be slower, because the address must be computed 3 times and there are 3 memory loads (2 of them are cached and very fast, but it's still slower than reusing a register).

In practice, the answer will depend on your compiler, and you should check this by benchmarking.

answered Oct 8, 2013 at 9:59

Krzysztof Kosiński

4,3652 gold badges21 silver badges24 bronze badges

Comments

ST3 · Accepted Answer · 2013-10-08 10:11:08Z

1

It depends on compiler but all of them should be the same.

First lets look at case B smart compiler will generate code to load value into register only once so it doesn't matter if you use some additional variable or not, compiler generates opcode for mov instruction and has value into register. So B is the same as A.

Now lets compare A and C. We should look at opeators [] inline implementation. a[b] actually is *(a + b) so *(numbers + i) the same as numbers[i] that means cases A and C are the same.

So we have (A==B) && (A==C) all in all (A==B==C) If you know what I mean :).

answered Oct 8, 2013 at 10:11

ST3

8,9805 gold badges72 silver badges98 bronze badges

Collectives™ on Stack Overflow

Local variable vs. array access

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related