2

I'm doing data analysis that involves minimizing the least-square-error between a set of points and a corresponding set of orthogonal functions. In other words, I'm taking a set of y-values and a set of functions, and trying to zero in on the x-value that gets all of the functions closest to their corresponding y-value. Everything is being done in a 'data_set' class. The functions that I'm comparing to are all stored in one list, and I'm using a class method to calculate the total lsq-error for all of them:

self.fits = [np.poly1d(np.polyfit(self.x_data, self.y_data[n],10)) for n in range(self.num_points)]

def error(self, x, y_set):
    arr = [(y_set[n] - self.fits[n](x))**2 for n in range(self.num_points)]
    return np.sum(arr)

This was fine when I had significantly more time than data, but now I'm taking thousands of x-values, each with a thousand y-values, and that for loop is unacceptably slow. I've been trying to use np.vectorize:

#global scope
def func(f,x):
    return f(x)
vfunc = np.vectorize(func, excluded=['x'])
…
…
#within data_set class
    def error(self, x, y_set):
        arr = (y_set - vfunc(self.fits, x))**2
        return np.sum(arr)

func(self.fits[n], x) works fine as long as n is valid, and as far as I can tell from the docs, vfunc(self.fits, x) should be equivalent to

[self.fits[n](x) for n in range(self.num_points)]

but instead it throws:

ValueError: cannot copy sequence with size 10 to array axis with dimension 11

10 is the degree of the polynomial fit, and 11 is (by definition) the number of terms in it, but I have no idea why they're showing up here. If I change the fit order, the error message reflects the change. It seems like np.vectorize is taking each element of self.fits as a list rather than a np.poly1d function.

Anyway, if someone could either help me understand np.vectorize better, or suggest another way to eliminate that loop, that would be swell.

1
  • 2
    np.vectorize does not make things faster. If I recall an earlier SO, its excluded=['x'] only works if x is a keyword argument. So not only is it slower than plain iteration, it is harder to use correctly. It's useful only if you are broadcasting one input argument against another. Commented May 24, 2019 at 0:46

1 Answer 1

3

As the functions in question all have a very similar structure we can "manually" vectorize once we've extracted the poly coefficients. In fact, the function is then a quite simple one-liner, eval_many below:

import numpy as np

def poly_vec(list_of_polys):
    O = max(p.order for p in list_of_polys)+1
    C = np.zeros((len(list_of_polys), O))
    for p, c in zip(list_of_polys, C):
        c[len(c)-p.order-1:] = p.coeffs
    return C

def eval_many(x,C):
    return [email protected](x,11).T

# make example
list_of_polys = [np.poly1d(v) for v in np.random.random((1000,11))]
x = np.random.random((2000,))

# put all coeffs in one master matrix
C = poly_vec(list_of_polys)

# test
assert np.allclose(eval_many(x,C), [p(x) for p in list_of_polys])

from timeit import timeit

print('vectorized', timeit(lambda: eval_many(x,C), number=100)*10)
print('loopy     ', timeit(lambda: [p(x) for p in list_of_polys], number=10)*100)

Sample run:

vectorized 6.817315469961613
loopy      56.35076989419758
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.