apply both vectorized and non-vectorized function on numpy array

Question

I have a function which does this: it takes a given numpy array A and a given function func and applies the function to each element of the array.

def transform(A, func):
    return func(A)

A and func are supplied from outside and I do not have control over them. I would like the functions to work if they are vectorized functions such as transform(A, np.sin) but I also want to be able to accept normal numpy function e.g. lambdas like transform(A, lambda x: x^2 if x > 5 else 0). Of course the second is not vectorized so I would need to call np.vectorize() before applying it. Like this: transform(A, np.vectorize(lambda x: x^2 if x > 5 else 0))... But I do nto want to impose this burden on the users. I would like a unified approach to all functions. I just get a function from outside and apply it.

Is there a method to decide which function requires vectorization and which does not? Something like:

def transform(A, func):
    if requires_vectorization(func):  # how to do this???
        func = np.vectorize(func)
    return func(A)

Or should I just vectorize all by default

def transform(A, func):
    func = np.vectorize(func)  # is this correct and efficient?
    return func(A)

Is this solution good? In other words, does it hurt to call np.vectorize on already vectorized function? Or is there any alternative?

You can at least measure the impact of vectorising an already vectorised function. I don't know how complicated your function would be, or how huge your data, but I doubt you'd notice a different for small arrays. — Reti43
– Reti43, Commented Jan 7, 2016 at 21:56

Community · Accepted Answer · 2020-06-20 09:12:55Z

3

Following the EAFP principle, you could first try calling the function directly on A and see if this raises an exception:

import numpy as np

def transform(A, func):
    try:
        return func(A)
    except TypeError:
        return np.vectorize(func)(A)

For example:

import math

A = np.linspace(0, np.pi, 5)

print(transform(A, np.sin))     # vectorized function
# [  0.00000000e+00   7.07106781e-01   1.00000000e+00   7.07106781e-01
#    1.22464680e-16]

print(transform(A, math.sin))   # non-vectorized function
# [  0.00000000e+00   7.07106781e-01   1.00000000e+00   7.07106781e-01
#    1.22464680e-16]

does it hurt to call np.vectorize on already vectorized function?

Yes, absolutely. When you apply np.vectorize to a function, all of the looping over input array elements is done in Python, unlike in "proper" vectorized numpy functions which do their looping in C. From the documentation:

The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

I feel like this sentence should be written in bold all-caps.

Case in point:

In [1]: vecsin = np.vectorize(np.sin)

In [2]: %%timeit A = np.random.randn(10000);
np.sin(A)
   ....: 
1000 loops, best of 3: 243 µs per loop

In [3]: %%timeit A = np.random.randn(10000);
vecsin(A)
   ....: 
100 loops, best of 3: 11.7 ms per loop

In [4]: %%timeit A = np.random.randn(10000);
[np.sin(a) for a in A]
   ....: 
100 loops, best of 3: 12.5 ms per loop

In this example, applying np.vectorize to np.sin slows it down by a factor of ~50, making it about as slow as a regular Python list comprehension.

Edit:

For completeness, here's the "transformed" version. As you can see, the try/except block has a negligible impact on performance:

In [5]: %%timeit A = np.random.randn(10000);
transform(A, np.sin)
   ...: 
1000 loops, best of 3: 241 µs per loop

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Jan 7, 2016 at 22:41

ali_m

74.6k28 gold badges230 silver badges314 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

HiFile.app - best file manager Over a year ago

This is excellent answer. To make it just perfect, you could also time the proposed version with try-except structure. :)

ali_m Over a year ago

See my update - the try/except block has essentially no impact on performance (as you'd expect)

HiFile.app - best file manager Over a year ago

The try-except block probably should also check for ValueError. This is what lambdas, e.g. lambda x: 0 < x < 1, raise in my tests.

ali_m Over a year ago

In that particular case you should be using numpy's vectorized logical operators, e.g. lambda x: (0 < x) & (x < 1). In general, I would expect that passing an array to a non-vectorized function ought to raise a TypeError rather than a ValueError. It's up to you what exceptions you want to catch, but keeping it as specific as possible will tend to make debugging easier.

Collectives™ on Stack Overflow

apply both vectorized and non-vectorized function on numpy array

1 Answer 1

Edit:

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Edit:

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related