python numpy - optimize chisq function by removing explicit python loop?

Question

I'm trying to evaluate a chi squared function, i.e. compare an arbitrary (blackbox) function to a numpy vector array of data. At the moment I'm looping over the array in python but something like this is very slow:

n=len(array)
sigma=1.0
chisq=0.0
for i in range(n):
    data = array[i]
    model = f(i,a,b,c)
    chisq += 0.5*((data-model)/sigma)**2.0
return chisq

array is a 1-d numpy array and a,b,c are scalars. Is there a way to speed this up by using numpy.sum() or some sort of lambda function etc.? I can see how to remove one loop (over chisq) like this:

numpy.sum(((array-model_vec)/sigma)**2.0)

but then I still need to explicitly populate the array model_vec, which will presumably be just as slow; how do I do that without an explicit loop like this:

model_vec=numpy.zeros(len(data))
for i in range(n):
    model_vec[i] = f(i,a,b,c)
return numpy.sum(((array-model_vec)/sigma)**2.0)

?

Thanks!

What's your definition of f? You should vectorize it such that it supports arrays! — David Zwicker
– David Zwicker, Commented Apr 26, 2013 at 12:14
In the simplest case, f might be a linear function like f[i]=a*x[i]+b, where x is some other vector. A problem is that in general the function might be much more complicated, e.g. a lookup table. — jtlz2
– jtlz2, Commented Apr 26, 2013 at 13:04
A look-up table is easy to vectorize, either using np.take or fancy indexing. The meat of your performance issue is vectorizing f, if you can't figure it out ask again with your specific function and problem. — Jaime
– Jaime, Commented Apr 26, 2013 at 15:39

askewchan · Accepted Answer · 2013-04-26 15:14:51Z

You can use np.vectorize to 'vectorize' your function f if you don't have control over its definition:

g = np.vectorize(f)

But this is not as good as vectorizing the function yourself manually to support arrays, as it doesn't really do much more than internalize the loop, and it might not work well with certain functions. In fact, from the documentation:

Notes The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

You should instead focus on making f accept a vector instead of i:

def f(i, a, b, x):
    return a*x[i] + b

def g(a, b, x):
    x = np.asarray(x)
    return a*x + b

Then, instead of calling f(i, a, b, x), call g(a,b,x)[i] if you only want the ith, but for operations on the entire function, use g(a, b, x) and it will be much faster.

model_vec = g(a, b, x)
return numpy.sum(((array-model_vec)/sigma)**2.0)

danodonovan · Accepted Answer · 2013-04-26 12:20:23Z

0

It seems that your code is slow because what is executing in the loop is slow (your model generation). Turning this into a one-liner won't speed things up. If you have access to a modern computer with more than on CPU you could try to run this loop in parallel - for example using the multiprocessing module;

from multiprocessing import Pool

if __name__ == '__main__':

    # snip set up code

    pool = Pool(processes=4)              # start 4 worker processes
    inputs = [(i,a,b,c) for i in range(n)]
    model_array = pool.map(model, inputs)

    for i in range(n):
        data = array[i]
        model = model_array[i]
        chisq += 0.5*((data-model)/sigma)**2.0

answered Apr 26, 2013 at 12:20

danodonovan

20.5k10 gold badges74 silver badges78 bronze badges

Collectives™ on Stack Overflow

python numpy - optimize chisq function by removing explicit python loop?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related