How to optimize a for loop that uses consecutive values with Numpy?

Question

I'm trying to create a function that returns a numpy.array with n pseudo-random uniformly distributed numbers between 0 and 1. The details of the method used can be found here: https://en.wikipedia.org/wiki/Linear_congruential_generator

So far, it works great. The only problem is that each new value is calculated by using the previous value, so the only solution I've found so far uses a loop, and for the sake of eficiency I'm trying to get rid of that loop, possibly by vectorizing the operation - however, I don't know how to do so.

Do you have any suggestions on how to optimize this function?

import numpy as np
import time

def unif(n):
    m = 2**32
    a = 1664525
    c = 1013904223

    result = np.empty(n)
    result[0] = int((time.time() * 1e7) % m)

    for i in range(1,n):
        result[i] = (a*result[i-1]+c) % m

    return result / m

I don't think that's possible. Take a look at the answer here:stackoverflow.com/questions/44455481/… — vestland
– vestland, Commented Sep 21, 2018 at 4:08
FYI: If you are sticking with m=2**32 there is actually a very (90x) fast, fully vectorized numpy solution. I've updated my answer. — Paul Panzer
– Paul Panzer, Commented Sep 22, 2018 at 3:59

Alexander · Accepted Answer · 2018-09-21 05:03:48Z

5

Although not vectorized, I believe the following solution is about 2x faster (60x faster with the numba solution). It saves each result as a local variable instead of accessing the numpy array by location.

def unif_improved(n):
    m = 2**32
    a = 1664525
    c = 1013904223

    results = np.empty(n)
    results[0] = result = int((time.time() * 1e7) % m)

    for i in range(1, n):
        result = results[i] = (a * result + c) % m

    return results / m

You may also consider using Numba for further efficiencies in speed. https://numba.pydata.org/

Just the addition of the decorator @jit blows to doors off of the other solutions.

from numba import jit

@jit
def unif_jit(n):
    # Same code as `unif_improved`

Timings

>>> %timeit -n 10 unif_original(500000)
715 ms ± 21.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit -n 10 unif_improved(500000)
323 ms ± 8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit -n 10 unif_jit(500000)
12 ms ± 2.68 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

edited Sep 21, 2018 at 5:03

answered Sep 21, 2018 at 4:51

Alexander

111k32 gold badges212 silver badges208 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jorge Esteban Mendoza Over a year ago

Unless someone else comes here with another improvement, I think I'll stick to your code. Thanks for the tips!

Hans Musgrave · Accepted Answer · 2018-09-21 04:53:21Z

2

This isn't possible to do completely since the answers depend on each other sequentially. The magic of modular arithmetic does mean that you can get a small improvement gain with the following change (modified from @Alexander's suggestion to use a local variable instead of an array lookup).

def unif_2(n):
    m = 2**32
    a = 1664525
    c = 1013904223

    results = np.empty(n)
    results[0] = result = int((time.time() * 1e7) % m)

    for i in range(1, n):
        result = results[i] = (a * result + c)

    return results % m / m

answered Sep 21, 2018 at 4:53

Hans Musgrave

7,2012 gold badges21 silver badges40 bronze badges

1 Comment

Jorge Esteban Mendoza Over a year ago

After testing this version against Alexander's one, for some reason this arithmetic turns out to be less efficient. When called with n = 1e8 this version is approximately .5 seconds less efficient than the other one on my PC. However, thank you for the tip!

Paul Panzer · Accepted Answer · 2018-09-22 03:56:01Z

Update:

Taking advantage of the modulus being 2^32 we can eliminate all Python loops and get a speedup of ~91.1.

Allowing for any modulus it is still possible to reduce the linear length loop to a log length loop. For 500,000 samples this gives a speedup of ~17.1. If we precompute the multistep factors and offsets (they are the same for any seed) this goes up to ~44.8.

Code:

import numpy as np
import time

def unif(n, seed):
    m = 2**32
    a = 1664525
    c = 1013904223

    result = np.empty(n)
    result[0] = seed

    for i in range(1,n):
        result[i] = (a*result[i-1]+c) % m

    return result / m

def precomp(n):
    l = n.bit_length()
    a, c = np.empty((2, 1+(1<<l)), np.uint64)
    m = 2**32
    a[:2] = 1, 1664525
    c[:2] = 0, 1013904223

    p = 1
    for j in range(l):
        a[1+p:1+(p<<1)] = a[p] * a[1:1+p] % m
        c[1+p:1+(p<<1)] = (a[p] * c[1:1+p] + c[p]) % m
        p <<= 1

    return a, c

def unif_opt(n, seed, a=None, c=None):
    if a is None:
        a, c = precomp(n)
    return (seed * a[:n] + c[:n]) % m / m

def unif_32(n, seed):
    out = np.empty((n,), np.uint32)
    out[0] = 1
    np.broadcast_to(np.uint32(1664525), (n-1,)).cumprod(out=out[1:])
    c = out[:-1].cumsum(dtype=np.uint32)
    c *= 1013904223
    out *= seed
    out[1:] += c
    return out / m

m = 2**32
seed = int((time.time() * 1e7) % m)
n = 500000
a, c = precomp(n)

print('results equal:', np.allclose(unif(n, seed), unif_opt(n, seed)) and 
      np.allclose(unif_opt(n, seed), unif_opt(n, seed, a, c)) and
      np.allclose(unif_32(n, seed), unif_opt(n, seed, a, c)))

from timeit import timeit

t = timeit('unif(n, seed)', globals=globals(), number=10)
t_opt = timeit('unif_opt(n, seed)', globals=globals(), number=10)
t_prc = timeit('unif_opt(n, seed, a, c)', globals=globals(), number=10)
t_32 = timeit('unif_32(n, seed)', globals=globals(), number=10)
print(f'speedup without precomp: {t/t_opt:.1f}')
print(f'speedup with precomp:    {t/t_prc:.1f}')
print(f'speedup special case:    {t/t_32:.1f}')

Sample run:

results equal: True
speedup without precomp: 17.1
speedup with precomp:    44.8
speedup special case:    91.1

Collectives™ on Stack Overflow

How to optimize a for loop that uses consecutive values with Numpy?

3 Answers 3

1 Comment

1 Comment

Update:

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Update:

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related