1

I'm running python with numpy, and I have a loop which, stripped down, looks like this:

result = np.zeros(bins)
for i in xrange(bins):
    result[f(i)] += source[i]

Here, both result and source are numpy arrays, and f is a mildly complicated set of arithmetic operations. For example, one simplified example of f might look like

f = lambda x: min(int(width*pow(x, 3)), bins-1)

though f is generally not monotonic in its argument.

This loop is currently the bottleneck in my program. I managed to vectorize everything else, but I'm currently stumped on how to do it here. How can this loop be vectorized?

2
  • 1
    To (possibly) vectorize this calculation we need to see the definition of f. Commented Jul 26, 2016 at 2:09
  • @unutbu f is pretty complicated and varies, but it always contains only the Python math functions/operators: pow, min, max, floor, <, >, log. Would it help if I gave an explicit example? Commented Jul 26, 2016 at 2:10

1 Answer 1

3

To vectorize f the main idea is to replace scalar operations with numpy vector-based functions. For example, if originally we have

def f(x):
    return min(int(width*pow(x, 3)), bins-1)

then we could instead use

def fvec(x):
    return np.minimum((width*np.power(x, 3)).astype(int), bins-1)

There is a natural correspondence between some Python scalar functions and NumPy vectorized functions:

| pow   | np.power   |
| min   | np.minimum |
| max   | np.maximum |
| floor | np.floor   |
| log   | np.log     |
| <     | np.less    |
| >     | np.greater |

The vectorized functions take an array of inputs and returns an array of the same shape. However, there are other constructs which may not be so obvious. For example the vectorized equivalent of x if condition else y is np.where(condition, x, y).

Unfortunately, in general there is no simple shortcut. Translating from scalar function to vectorized function may require any one of the many NumPy functions available, as well NumPy concepts like broadcasting and advanced indexing.


For example, it is tempting at this point to replace

for i in range(bins):
    result[f(i)] += source[i]

with the integer-array indexed assignment:

result[fvec(np.arange(bins))] += source

but this produces an incorrect result if fvec(np.arange(bins)) has repeated values. Instead use np.bincount since this correctly accumulates multiple source values when fvec(np.arange(bins)) indicates the same bin:

result = np.bincount(fvec(np.arange(bins)), weights=source, minlength=bins)

import numpy as np
import pandas as pd

bins = 1000
width = 1.5
source = np.random.random(bins)

def fvec(x):
    return np.minimum((width*np.power(x, 3)).astype(int), bins-1)

def f(x):
    return min(int(width*pow(x, 3)), bins-1)

def orig():
    result = np.zeros(bins)
    for i in range(bins):
        result[f(i)] += source[i]
    return result

def alt():
    result = np.bincount(fvec(np.arange(bins)), weights=source, minlength=bins)
    return result

assert np.allclose(orig(), alt())

For the example above with bins=1000, alt is about 62x faster than orig (on my machine):

In [194]: %timeit orig()
1000 loops, best of 3: 1.37 ms per loop

In [195]: %timeit alt()
10000 loops, best of 3: 21.8 µs per loop

The speed advantage of alt over orig will increase as the number of iterations required by orig's for-loop increases -- that is, as bins increases.

Sign up to request clarification or add additional context in comments.

1 Comment

i just implemented this and got a speedup of 100x. great stuff!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.