To vectorize f
the main idea is to replace scalar operations with numpy vector-based functions.
For example, if originally we have
def f(x):
return min(int(width*pow(x, 3)), bins-1)
then we could instead use
def fvec(x):
return np.minimum((width*np.power(x, 3)).astype(int), bins-1)
There is a natural correspondence between some Python scalar functions and NumPy
vectorized functions:
| pow | np.power |
| min | np.minimum |
| max | np.maximum |
| floor | np.floor |
| log | np.log |
| < | np.less |
| > | np.greater |
The vectorized functions take an array of inputs and returns an array of the same shape.
However, there are other constructs which may not be so obvious. For example the
vectorized equivalent of x if condition else y is np.where(condition, x, y).
Unfortunately, in general there is no simple shortcut. Translating from
scalar function to vectorized function may require any one of the many
NumPy functions available, as well NumPy concepts like broadcasting and advanced
indexing.
For example, it is tempting at this point to replace
for i in range(bins):
result[f(i)] += source[i]
with the integer-array indexed assignment:
result[fvec(np.arange(bins))] += source
but this produces an incorrect result if fvec(np.arange(bins)) has repeated values. Instead use
np.bincount since this correctly accumulates multiple source values when fvec(np.arange(bins)) indicates the same bin:
result = np.bincount(fvec(np.arange(bins)), weights=source, minlength=bins)
import numpy as np
import pandas as pd
bins = 1000
width = 1.5
source = np.random.random(bins)
def fvec(x):
return np.minimum((width*np.power(x, 3)).astype(int), bins-1)
def f(x):
return min(int(width*pow(x, 3)), bins-1)
def orig():
result = np.zeros(bins)
for i in range(bins):
result[f(i)] += source[i]
return result
def alt():
result = np.bincount(fvec(np.arange(bins)), weights=source, minlength=bins)
return result
assert np.allclose(orig(), alt())
For the example above with bins=1000, alt is about 62x faster than orig (on my machine):
In [194]: %timeit orig()
1000 loops, best of 3: 1.37 ms per loop
In [195]: %timeit alt()
10000 loops, best of 3: 21.8 µs per loop
The speed advantage of alt over orig will increase as the number of iterations required by orig's for-loop increases -- that is, as bins increases.
f.