How to deoptimze memory access in python?

Question

This may not useful. It's just a challenge I have set up for myself.

Let's say you have a big array. What can you do so that the program does not benefit from caching, cache line prefetching or the fact that the next memory access can only be determined after the first access finishes.

So we have our array:

array = [0] * 10000000

What would be the best way to deoptimize the memory access if you had to access all elements in a loop? The idea is to increase the access time of each memory location as much as possible

I'm not looking for a solution which proposes to do "something else" (which takes time) before doing the next access. The idea is really to increase the access time as much as possible. I guess we have to traverse the array in a certain way (perhaps randomly? I'm still looking into it)

So you want to slow down array[...]? Since "doing something else" is not allowed, what is allowed? — Niklas Mertsch
– Niklas Mertsch, Commented Nov 14, 2020 at 17:40
Also, you are talking about pure array[k] here, not e.g. array.pop(k) or array[:][k], right? — tobias_k
– tobias_k, Commented Nov 14, 2020 at 17:41
Sorry for my lack of clarity. Yes, I want to slow down array[k], where k is the index (or memory address). What I meant by "doing somehting else if not allowed", I was trying to say that you can't run another function which takes time. What I want to slow down is the access, not by trying to insert another function or something else that takes time. — Kris
– Kris, Commented Nov 14, 2020 at 17:44

tobias_k · Accepted Answer · 2020-11-14 18:37:57Z

I did not expect any difference, but in fact accessing the digits in random order is significantly slower than accessing them in order or in reverse order (which is both about the same).

>>> N = 10**5
>>> arr = [random.randint(0, 1000) for _ in range(N)]
>>> srt = list(range(N))
>>> rvd = srt[::-1]
>>> rnd = random.sample(srt, N)
>>> %timeit sum(arr[i] for i in srt)
10 loops, best of 5: 24.9 ms per loop
>>> %timeit sum(arr[i] for i in rvd)
10 loops, best of 5: 25.7 ms per loop
>>> %timeit sum(arr[i] for i in rnd)
10 loops, best of 5: 59.2 ms per loop

And it really seems to be the randomness. Just accessing indices out of order, but with a pattern, e.g. as [0, N-1, 2, N-3, ...] or [0, N/2, 1, N/2+1, ...], is just as fast as accessing them in order:

>>> alt1 = [i if i % 2 == 0 else N - i for i in range(N)]
>>> alt2 = [i for p in zip(srt[:N//2], srt[N//2:]) for i in p]
>>> %timeit sum(arr[i] for i in alt1)
10 loops, best of 5: 24.5 ms per loop
>>> %timeit sum(arr[i] for i in alt2)
10 loops, best of 5: 24.1 ms per loop

Interestingly, just iterating the shuffled indices (and calculating their sum as with the array above) is also slower than doing the same with the sorted indices, but not as much. Of the ~35ms difference between srt and rnd, ~10ms seem to come from iterating the randomized indices, and ~25ms for actually accessing the indices in random order.

>>> %timeit sum(i for i in srt)
100 loops, best of 5: 19.7 ms per loop
>>> %timeit sum(i for i in rnd)
10 loops, best of 5: 30.5 ms per loop
>>> %timeit sum(arr[i] for i in srt)
10 loops, best of 5: 24.5 ms per loop
>>> %timeit sum(arr[i] for i in rnd)
10 loops, best of 5: 56 ms per loop

(IPython 5.8.0 / Python 3.7.3 on a rather old laptop running Linux)

tdelaney · Accepted Answer · 2020-11-14 17:58:49Z

0

Python interns small integers. Use integers > 255. * just adds references to the number already in the list when expanded, use unique values instead. Caches hate randomness, so go random.

import random
array = list(range(256, 10000256))
while array:
    array.pop(random.randint(0, len(array)-1))

A note on interning small integers. When you create an integer in your program, say 12345, python creates an object on the heap of 55 or greater bytes. This is expensive. So, numbers between (I think) -4 and 255 are built into python to optimize common small number operations. By avoiding these numbers you force python to allocate integers on the heap, spreading out the amount of memory you will touch and reducing cache efficiency.

If you use a single number in the array [1234] * 100000, that single number is referenced many times. If you use unique numbers, they are all individually allocated on the heap, increasing memory footprint. And when they are removed from the list, python has to touch the object to reduce its reference count which pulls its memory location into cache, invalidating something else.

edited Nov 14, 2020 at 17:58

answered Nov 14, 2020 at 17:42

tdelaney

77.9k6 gold badges91 silver badges129 bronze badges

2 Comments

Kris Over a year ago

Why "Python interns small integers. Use integers > 255. * just adds references to the number already in the list when expanded, use unique values instead" ? I don't quite get why we would need that. What does it mean that python interns small integers ?

tdelaney Over a year ago

@Kris - added explanation.

Collectives™ on Stack Overflow

How to deoptimze memory access in python?

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related