0

I am trying to compute matrix z (defined below) in python with numpy.

Here's my current solution (using 1 for loop)

z = np.zeros((n, k))
for i in range(n):
    v = pi * (1 / math.factorial(x[i])) * np.exp(-1 * lamb) * (lamb ** x[i])
    numerator = np.sum(v)
    c = v / numerator
    z[i, :] = c
return z

Is it possible to completely vectorize this computation? I need to do this computation for thousands of iterations, and matrix operations in numpy is much faster than huge for loops.

3
  • 1
    Add a minimal sample case? Commented Oct 6, 2017 at 12:02
  • @Divakar thanks for commenting. This gist can be used as a example (python3). An EM algorithm is implemented, and I would like to vectorize the E(x) function. Commented Oct 6, 2017 at 12:12
  • Is lamb constant? Commented Oct 6, 2017 at 15:41

1 Answer 1

1

Here is a vectorized version of E. It replaces the for-loop and scalar arithmetic with NumPy broadcasting and array-based arithmetic:

def alt_E(x):
    x = x[:, None]
    z = pi * (np.exp(-lamb) * (lamb**x)) / special.factorial(x)
    denom = z.sum(axis=1)[:, None]
    z /= denom
    return z

I ran em.py to get a sense for the typical size of x, lamb, pi, n and k. On data of this size, alt_E is about 120x faster than E:

In [32]: %timeit E(x)
100 loops, best of 3: 11.5 ms per loop

In [33]: %timeit alt_E(x)
10000 loops, best of 3: 94.7 µs per loop

In [34]: 11500/94.7
Out[34]: 121.43611404435057

This is the setup I used for the benchmark:

import math
import numpy as np
import scipy.special as special

def alt_E(x):
    x = x[:, None]
    z = pi * (np.exp(-lamb) * (lamb**x)) / special.factorial(x)
    denom = z.sum(axis=1)[:, None]
    z /= denom
    return z


def E(x):
    z = np.zeros((n, k))
    for i in range(n):
        v = pi * (1 / math.factorial(x[i])) * \
            np.exp(-1 * lamb) * (lamb ** x[i])
        numerator = np.sum(v)
        c = v / numerator
        z[i, :] = c
    return z

n = 576
k = 2

x = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5])

lamb = np.array([ 0.84835141, 1.04025989])
pi = np.array([ 0.5806958, 0.4193042])

assert np.allclose(alt_E(x), E(x))

By the way, E could also be calculated using scipy.stats.poisson:

import scipy.stats as stats
pois = stats.poisson(mu=lamb)

def alt_E2(x):
    z = pi * pois.pmf(x[:,None])
    denom = z.sum(axis=1)[:, None]
    z /= denom
    return z

but this does not turn out to be faster, at least for arrays of this length:

In [33]: %timeit alt_E(x)
10000 loops, best of 3: 94.7 µs per loop

In [102]: %timeit alt_E2(x)
1000 loops, best of 3: 278 µs per loop

For larger x, alt_E2 is faster:

In [104]: x = np.random.random(10000)

In [106]: %timeit alt_E(x)
100 loops, best of 3: 2.18 ms per loop

In [105]: %timeit alt_E2(x)
1000 loops, best of 3: 643 µs per loop
Sign up to request clarification or add additional context in comments.

1 Comment

You are the real MVP ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.