I've been studying cache locality recently and I'm trying to understand how CPUs access memory. I wrote an experiment to see if there was a performance difference when looping an array sequentially vs. using a lookup table of some sort to index into the data array. I was surprised to find the lookup method slightly faster. My code is below. I compiled with GCC on Windows (MinGW).
#include <stdlib.h>
#include <stdio.h>
#include <windows.h>
int main()
{
DWORD dwElapsed, dwStartTime;
//random arrangement of keys to lookup
int lookup_arr[] = {0, 3, 8, 7, 2, 1, 4, 5, 6, 9};
//data for both loops
int data_arr1[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
int data_arr2[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
//first loop, sequential access
dwStartTime = GetTickCount();
for (int n = 0; n < 9000000; n++) {
for (int i = 0; i < 10; i++)
data_arr1[i]++;
}
dwElapsed = GetTickCount() - dwStartTime;
printf("Normal loop completed: %d\n", dwElapsed);
//second loop, indexes into data_arr2 using the lookup array
dwStartTime = GetTickCount();
for (int n = 0; n < 9000000; n++) {
for (int i = 0; i < 10; i++)
data_arr2[lookup_arr[i]]++;
}
dwElapsed = GetTickCount() - dwStartTime;
printf("Lookup loop completed: %d\n", dwElapsed);
return 0;
}
Running this, I get:
Normal loop completed: 375
Lookup loop completed: 297