1

I'm trying to evaluate a chi squared function, i.e. compare an arbitrary (blackbox) function to a numpy vector array of data. At the moment I'm looping over the array in python but something like this is very slow:

n=len(array)
sigma=1.0
chisq=0.0
for i in range(n):
    data = array[i]
    model = f(i,a,b,c)
    chisq += 0.5*((data-model)/sigma)**2.0
return chisq

array is a 1-d numpy array and a,b,c are scalars. Is there a way to speed this up by using numpy.sum() or some sort of lambda function etc.? I can see how to remove one loop (over chisq) like this:

numpy.sum(((array-model_vec)/sigma)**2.0)

but then I still need to explicitly populate the array model_vec, which will presumably be just as slow; how do I do that without an explicit loop like this:

model_vec=numpy.zeros(len(data))
for i in range(n):
    model_vec[i] = f(i,a,b,c)
return numpy.sum(((array-model_vec)/sigma)**2.0)

?

Thanks!

3
  • 2
    What's your definition of f? You should vectorize it such that it supports arrays! Commented Apr 26, 2013 at 12:14
  • In the simplest case, f might be a linear function like f[i]=a*x[i]+b, where x is some other vector. A problem is that in general the function might be much more complicated, e.g. a lookup table. Commented Apr 26, 2013 at 13:04
  • 1
    A look-up table is easy to vectorize, either using np.take or fancy indexing. The meat of your performance issue is vectorizing f, if you can't figure it out ask again with your specific function and problem. Commented Apr 26, 2013 at 15:39

2 Answers 2

1

You can use np.vectorize to 'vectorize' your function f if you don't have control over its definition:

g = np.vectorize(f)

But this is not as good as vectorizing the function yourself manually to support arrays, as it doesn't really do much more than internalize the loop, and it might not work well with certain functions. In fact, from the documentation:

Notes The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

You should instead focus on making f accept a vector instead of i:

def f(i, a, b, x):
    return a*x[i] + b

def g(a, b, x):
    x = np.asarray(x)
    return a*x + b

Then, instead of calling f(i, a, b, x), call g(a,b,x)[i] if you only want the ith, but for operations on the entire function, use g(a, b, x) and it will be much faster.

model_vec = g(a, b, x)
return numpy.sum(((array-model_vec)/sigma)**2.0)
Sign up to request clarification or add additional context in comments.

Comments

0

It seems that your code is slow because what is executing in the loop is slow (your model generation). Turning this into a one-liner won't speed things up. If you have access to a modern computer with more than on CPU you could try to run this loop in parallel - for example using the multiprocessing module;

from multiprocessing import Pool

if __name__ == '__main__':

    # snip set up code

    pool = Pool(processes=4)              # start 4 worker processes
    inputs = [(i,a,b,c) for i in range(n)]
    model_array = pool.map(model, inputs)

    for i in range(n):
        data = array[i]
        model = model_array[i]
        chisq += 0.5*((data-model)/sigma)**2.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.