6

I'm curious about the benefits and tradeoffs of using numpy ufuncs vs. the built-in operators vs. the 'function' versions of the built-in operators.

I'm curious about all ufuncs. Maybe there are times when some are more useful than others. However, I'll use < for my examples just for simplicity.

There are several ways to 'filter' a numpy array by a single number to get a boolean array. Each form gives the same results, but is there a preferred time/place to use one over the other? This example I'm comparing an array against a single number, so all 3 will work.

Consider all examples using the following array:

>>> x = numpy.arange(0, 10000)
>>> x
array([   0,    1,    2, ..., 9997, 9998, 9999])

'<' operator

>>> x < 5000
array([ True,  True,  True, ..., False, False, False], dtype=bool)
>>> %timeit x < 5000
100000 loops, best of 3: 15.3 us per loop

operator.lt

>>> import operator
>>> operator.lt(x, 5000)
array([ True,  True,  True, ..., False, False, False], dtype=bool)
>>> %timeit operator.lt(x, 5000)
100000 loops, best of 3: 15.3 us per loop

numpy.less

>>> numpy.less(x, 5000)
array([ True,  True,  True, ..., False, False, False], dtype=bool)
>>> %timeit numpy.less(x, 5000)
100000 loops, best of 3: 15 us per loop

Note that all of them achieve pretty much the equivalent performance and exactly the same results. I'm guessing that all of these calls actually end up in the same function anyway since < and operator.lt both map to __lt__ on a numpy array, which is probably implemented using numpy.less or the equivalent?

So, which is more 'idiomatic' and 'preferred'?

2 Answers 2

4

Generally speaking, thinking of the "readability counts" mantra, the actual operator should always be your preferred choice. Using the operator versions has a place, when you can replace lambda a, b: a < b with the more compact operator.lt, but not much outside of that. And you really shouldn't be using explicit calls to the corresponding ufunc, unless you want to use the out parameter to store the calculated values directly in an existing array.

That said, if what you are worried is performance, you should do fair comparisons, because as you say, all your calls are eventually handled by numpy's less ufunc.

If your data is already in a numpy array, then you have already shown that they are all performing similarly, so go with the < operator for clarity.

What if your data is in a python object, say a list? Well, here are some timings for you to ponder:

In [13]: x = range(10**5)

In [19]: %timeit [j < 5000 for j in x]
100 loops, best of 3: 5.32 ms per loop

In [20]: %timeit np.less(x, 5000)
100 loops, best of 3: 11.3 ms per loop

In [21]: %timeit [operator.lt(j, 5000) for j in x]
100 loops, best of 3: 16.2 ms per loop

Not sure why operator.lt is so slow, but you clearly want to stay away from it. If you want to get a numpy array as output from a Python object input, then this will probably be the fastest:

In [22]: %timeit np.fromiter((j < 5000 for j in x), dtype=bool, count=10**5)
100 loops, best of 3: 7.91 ms per loop

Note that ufuncs operating on numpy arrays are much faster than any of the above:

In [24]: y = np.array(x)

In [25]: %timeit y < 5000
10000 loops, best of 3: 82.8 us per loop
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! In all of your examples is x a standard Python list?
3

In this case, the preferred form is x < 5000 because it is simpler and you are already using a numpy array.

ufuncs are meant to allow these operations to be done on any type of data (not only numpy arrays)

>>> numpy.less([1, 2, 3, 4, 6, 8], 5)
array([ True,  True,  True,  True, False, False], dtype=bool)

>>> [1, 2, 3, 4, 6, 8] < 5
False

On Python 3, this last comparison will raise an error.

3 Comments

This makes sense, but the numpy docs seem to disagree with your statement that they are meant for more than numpy arrays: "A universal function (or ufunc for short) is a function that operates on ndarrays in an element-by-element fashion, supporting array broadcasting, type casting, and several other standard features." I see from your example that it does work with a list, but something is conflicting unless I'm reading into the docs meaning too closely.
Also, I agree that < is more appropriate in this scenario. It just feels more natural. So, this leads to a somewhat related question, when should I use ufuncs?
It is meant for ndarrays, but works with lists and tuples because they are the basic container for Python and most your data may already be using these types. Using ufuncs is more or less like calling asarray on your element first.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.