6

I'm trying to create a function that returns a numpy.array with n pseudo-random uniformly distributed numbers between 0 and 1. The details of the method used can be found here: https://en.wikipedia.org/wiki/Linear_congruential_generator

So far, it works great. The only problem is that each new value is calculated by using the previous value, so the only solution I've found so far uses a loop, and for the sake of eficiency I'm trying to get rid of that loop, possibly by vectorizing the operation - however, I don't know how to do so.

Do you have any suggestions on how to optimize this function?

import numpy as np
import time

def unif(n):
    m = 2**32
    a = 1664525
    c = 1013904223

    result = np.empty(n)
    result[0] = int((time.time() * 1e7) % m)

    for i in range(1,n):
        result[i] = (a*result[i-1]+c) % m

    return result / m
2
  • I don't think that's possible. Take a look at the answer here:stackoverflow.com/questions/44455481/… Commented Sep 21, 2018 at 4:08
  • FYI: If you are sticking with m=2**32 there is actually a very (90x) fast, fully vectorized numpy solution. I've updated my answer. Commented Sep 22, 2018 at 3:59

3 Answers 3

5

Although not vectorized, I believe the following solution is about 2x faster (60x faster with the numba solution). It saves each result as a local variable instead of accessing the numpy array by location.

def unif_improved(n):
    m = 2**32
    a = 1664525
    c = 1013904223

    results = np.empty(n)
    results[0] = result = int((time.time() * 1e7) % m)

    for i in range(1, n):
        result = results[i] = (a * result + c) % m

    return results / m

You may also consider using Numba for further efficiencies in speed. https://numba.pydata.org/

Just the addition of the decorator @jit blows to doors off of the other solutions.

from numba import jit

@jit
def unif_jit(n):
    # Same code as `unif_improved`

Timings

>>> %timeit -n 10 unif_original(500000)
715 ms ± 21.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit -n 10 unif_improved(500000)
323 ms ± 8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit -n 10 unif_jit(500000)
12 ms ± 2.68 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Sign up to request clarification or add additional context in comments.

1 Comment

Unless someone else comes here with another improvement, I think I'll stick to your code. Thanks for the tips!
2

This isn't possible to do completely since the answers depend on each other sequentially. The magic of modular arithmetic does mean that you can get a small improvement gain with the following change (modified from @Alexander's suggestion to use a local variable instead of an array lookup).

def unif_2(n):
    m = 2**32
    a = 1664525
    c = 1013904223

    results = np.empty(n)
    results[0] = result = int((time.time() * 1e7) % m)

    for i in range(1, n):
        result = results[i] = (a * result + c)

    return results % m / m

1 Comment

After testing this version against Alexander's one, for some reason this arithmetic turns out to be less efficient. When called with n = 1e8 this version is approximately .5 seconds less efficient than the other one on my PC. However, thank you for the tip!
2

Update:

Taking advantage of the modulus being 2^32 we can eliminate all Python loops and get a speedup of ~91.1.

Allowing for any modulus it is still possible to reduce the linear length loop to a log length loop. For 500,000 samples this gives a speedup of ~17.1. If we precompute the multistep factors and offsets (they are the same for any seed) this goes up to ~44.8.

Code:

import numpy as np
import time

def unif(n, seed):
    m = 2**32
    a = 1664525
    c = 1013904223

    result = np.empty(n)
    result[0] = seed

    for i in range(1,n):
        result[i] = (a*result[i-1]+c) % m

    return result / m

def precomp(n):
    l = n.bit_length()
    a, c = np.empty((2, 1+(1<<l)), np.uint64)
    m = 2**32
    a[:2] = 1, 1664525
    c[:2] = 0, 1013904223

    p = 1
    for j in range(l):
        a[1+p:1+(p<<1)] = a[p] * a[1:1+p] % m
        c[1+p:1+(p<<1)] = (a[p] * c[1:1+p] + c[p]) % m
        p <<= 1

    return a, c

def unif_opt(n, seed, a=None, c=None):
    if a is None:
        a, c = precomp(n)
    return (seed * a[:n] + c[:n]) % m / m

def unif_32(n, seed):
    out = np.empty((n,), np.uint32)
    out[0] = 1
    np.broadcast_to(np.uint32(1664525), (n-1,)).cumprod(out=out[1:])
    c = out[:-1].cumsum(dtype=np.uint32)
    c *= 1013904223
    out *= seed
    out[1:] += c
    return out / m

m = 2**32
seed = int((time.time() * 1e7) % m)
n = 500000
a, c = precomp(n)

print('results equal:', np.allclose(unif(n, seed), unif_opt(n, seed)) and 
      np.allclose(unif_opt(n, seed), unif_opt(n, seed, a, c)) and
      np.allclose(unif_32(n, seed), unif_opt(n, seed, a, c)))

from timeit import timeit

t = timeit('unif(n, seed)', globals=globals(), number=10)
t_opt = timeit('unif_opt(n, seed)', globals=globals(), number=10)
t_prc = timeit('unif_opt(n, seed, a, c)', globals=globals(), number=10)
t_32 = timeit('unif_32(n, seed)', globals=globals(), number=10)
print(f'speedup without precomp: {t/t_opt:.1f}')
print(f'speedup with precomp:    {t/t_prc:.1f}')
print(f'speedup special case:    {t/t_32:.1f}')

Sample run:

results equal: True
speedup without precomp: 17.1
speedup with precomp:    44.8
speedup special case:    91.1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.