4

I have a function which does this: it takes a given numpy array A and a given function func and applies the function to each element of the array.

def transform(A, func):
    return func(A)

A and func are supplied from outside and I do not have control over them. I would like the functions to work if they are vectorized functions such as transform(A, np.sin) but I also want to be able to accept normal numpy function e.g. lambdas like transform(A, lambda x: x^2 if x > 5 else 0). Of course the second is not vectorized so I would need to call np.vectorize() before applying it. Like this: transform(A, np.vectorize(lambda x: x^2 if x > 5 else 0))... But I do nto want to impose this burden on the users. I would like a unified approach to all functions. I just get a function from outside and apply it.

Is there a method to decide which function requires vectorization and which does not? Something like:

def transform(A, func):
    if requires_vectorization(func):  # how to do this???
        func = np.vectorize(func)
    return func(A)   

Or should I just vectorize all by default

def transform(A, func):
    func = np.vectorize(func)  # is this correct and efficient?
    return func(A)   

Is this solution good? In other words, does it hurt to call np.vectorize on already vectorized function? Or is there any alternative?

1
  • You can at least measure the impact of vectorising an already vectorised function. I don't know how complicated your function would be, or how huge your data, but I doubt you'd notice a different for small arrays. Commented Jan 7, 2016 at 21:56

1 Answer 1

3

Following the EAFP principle, you could first try calling the function directly on A and see if this raises an exception:

import numpy as np

def transform(A, func):
    try:
        return func(A)
    except TypeError:
        return np.vectorize(func)(A)

For example:

import math

A = np.linspace(0, np.pi, 5)

print(transform(A, np.sin))     # vectorized function
# [  0.00000000e+00   7.07106781e-01   1.00000000e+00   7.07106781e-01
#    1.22464680e-16]

print(transform(A, math.sin))   # non-vectorized function
# [  0.00000000e+00   7.07106781e-01   1.00000000e+00   7.07106781e-01
#    1.22464680e-16]

does it hurt to call np.vectorize on already vectorized function?

Yes, absolutely. When you apply np.vectorize to a function, all of the looping over input array elements is done in Python, unlike in "proper" vectorized numpy functions which do their looping in C. From the documentation:

The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

I feel like this sentence should be written in bold all-caps.

Case in point:

In [1]: vecsin = np.vectorize(np.sin)

In [2]: %%timeit A = np.random.randn(10000);
np.sin(A)
   ....: 
1000 loops, best of 3: 243 µs per loop

In [3]: %%timeit A = np.random.randn(10000);
vecsin(A)
   ....: 
100 loops, best of 3: 11.7 ms per loop

In [4]: %%timeit A = np.random.randn(10000);
[np.sin(a) for a in A]
   ....: 
100 loops, best of 3: 12.5 ms per loop

In this example, applying np.vectorize to np.sin slows it down by a factor of ~50, making it about as slow as a regular Python list comprehension.

Edit:

For completeness, here's the "transformed" version. As you can see, the try/except block has a negligible impact on performance:

In [5]: %%timeit A = np.random.randn(10000);
transform(A, np.sin)
   ...: 
1000 loops, best of 3: 241 µs per loop
Sign up to request clarification or add additional context in comments.

4 Comments

This is excellent answer. To make it just perfect, you could also time the proposed version with try-except structure. :)
See my update - the try/except block has essentially no impact on performance (as you'd expect)
The try-except block probably should also check for ValueError. This is what lambdas, e.g. lambda x: 0 < x < 1, raise in my tests.
In that particular case you should be using numpy's vectorized logical operators, e.g. lambda x: (0 < x) & (x < 1). In general, I would expect that passing an array to a non-vectorized function ought to raise a TypeError rather than a ValueError. It's up to you what exceptions you want to catch, but keeping it as specific as possible will tend to make debugging easier.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.