For benchmarking reasons, I need to run the same algorithm over the same data multiple times. Now, I want to explore the case where all the runs over the same data are on cold caches.
I was thinking of adding a for loop reading every element of an array (LOAD/MOV instruction on every element) so the cache fill with the array elements. Eg.
vector<size_t> vec(CACHE_SIZE/sizeof(size_t));
//...
//...
size_t element;
for (size_t i = 0; i < vec.size(); ++i) {
element = vec[i];
}
But when compiled with optimization, all of this will probably be removed.
So how do I do this, possibly with minimal overhead.