Python vectorizing nested for loops

Question

I'd appreciate some help in finding and understanding a pythonic way to optimize the following array manipulations in nested for loops:

def _func(a, b, radius):
    "Return 0 if a>b, otherwise return 1"
    if distance.euclidean(a, b) < radius:
        return 1
    else:
        return 0

def _make_mask(volume, roi, radius):
    mask = numpy.zeros(volume.shape)
    for x in range(volume.shape[0]):
        for y in range(volume.shape[1]):
            for z in range(volume.shape[2]):
                mask[x, y, z] = _func((x, y, z), roi, radius)
    return mask

Where volume.shape (182, 218, 200) and roi.shape (3,) are both ndarray types; and radius is an int

Did any of these answers help? Relevant page to read: How does accepting an answer work? — rayryeng
– rayryeng, Commented Oct 18, 2016 at 21:23
please excuse the necropost, but A: you should accept @Divakar's post.. It's a wonderful demonstration of vectorizing with numpy, and B: you should take a look at KD trees and the ball point algorithm from scipy.spatial. It is a generalizable method for your specific problem when the data is sparse or not on a regular grid. Although it's not the best method for this exact question, It's a very good thing to know about (I recently used it myself) — Aaron
– Aaron, Commented Dec 16, 2016 at 22:39
@Divakar your explanation was very helpful, thanks. I upvoted initially, but just recently realized purpose of the checkmark. It's done. — Kambiz
– Kambiz, Commented May 9, 2017 at 19:08

Divakar · Accepted Answer · 2016-09-23 19:18:00Z

92

Approach #1

Here's a vectorized approach -

m,n,r = volume.shape
x,y,z = np.mgrid[0:m,0:n,0:r]
X = x - roi[0]
Y = y - roi[1]
Z = z - roi[2]
mask = X**2 + Y**2 + Z**2 < radius**2

Possible improvement : We can probably speedup the last step with numexpr module -

import numexpr as ne

mask = ne.evaluate('X**2 + Y**2 + Z**2 < radius**2')

Approach #2

We can also gradually build the three ranges corresponding to the shape parameters and perform the subtraction against the three elements of roi on the fly without actually creating the meshes as done earlier with np.mgrid. This would be benefited by the use of broadcasting for efficiency purposes. The implementation would look like this -

m,n,r = volume.shape
vals = ((np.arange(m)-roi[0])**2)[:,None,None] + \
       ((np.arange(n)-roi[1])**2)[:,None] + ((np.arange(r)-roi[2])**2)
mask = vals < radius**2

Simplified version : Thanks to @Bi Rico for suggesting an improvement here as we can use np.ogrid to perform those operations in a bit more concise manner, like so -

m,n,r = volume.shape    
x,y,z = np.ogrid[0:m,0:n,0:r]-roi
mask = (x**2+y**2+z**2) < radius**2

Runtime test

Function definitions -

def vectorized_app1(volume, roi, radius):
    m,n,r = volume.shape
    x,y,z = np.mgrid[0:m,0:n,0:r]
    X = x - roi[0]
    Y = y - roi[1]
    Z = z - roi[2]
    return X**2 + Y**2 + Z**2 < radius**2

def vectorized_app1_improved(volume, roi, radius):
    m,n,r = volume.shape
    x,y,z = np.mgrid[0:m,0:n,0:r]
    X = x - roi[0]
    Y = y - roi[1]
    Z = z - roi[2]
    return ne.evaluate('X**2 + Y**2 + Z**2 < radius**2')

def vectorized_app2(volume, roi, radius):
    m,n,r = volume.shape
    vals = ((np.arange(m)-roi[0])**2)[:,None,None] + \
           ((np.arange(n)-roi[1])**2)[:,None] + ((np.arange(r)-roi[2])**2)
    return vals < radius**2

def vectorized_app2_simplified(volume, roi, radius):
    m,n,r = volume.shape    
    x,y,z = np.ogrid[0:m,0:n,0:r]-roi
    return (x**2+y**2+z**2) < radius**2

Timings -

In [106]: # Setup input arrays  
     ...: volume = np.random.rand(90,110,100) # Half of original input sizes 
     ...: roi = np.random.rand(3)
     ...: radius = 3.4
     ...: 

In [107]: %timeit _make_mask(volume, roi, radius)
1 loops, best of 3: 41.4 s per loop

In [108]: %timeit vectorized_app1(volume, roi, radius)
10 loops, best of 3: 62.3 ms per loop

In [109]: %timeit vectorized_app1_improved(volume, roi, radius)
10 loops, best of 3: 47 ms per loop

In [110]: %timeit vectorized_app2(volume, roi, radius)
100 loops, best of 3: 4.26 ms per loop

In [139]: %timeit vectorized_app2_simplified(volume, roi, radius)
100 loops, best of 3: 4.36 ms per loop

So, as always broadcasting showing its magic for a crazy almost 10,000x speedup over the original code and more than 10x better than creating meshes by using on-the-fly broadcasted operations!

edited Sep 23, 2016 at 19:18

answered Sep 23, 2016 at 18:26

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Bi Rico Over a year ago

Approach #2 is a lot like approach one with np.ogrid replacing np.mgrid.

Bi Rico Over a year ago

Can we get a timing of 'app1' with ogrid instead of mgrid :).

Divakar Over a year ago

@BiRico Why instead, when we can get everything :) Thanks a lot for the improvement there, looks much cleaner now!

Ami Tavory · Accepted Answer · 2016-09-23 18:21:10Z

7

Say you first build an xyzy array:

import itertools

xyz = [np.array(p) for p in itertools.product(range(volume.shape[0]), range(volume.shape[1]), range(volume.shape[2]))]

Now, using numpy.linalg.norm,

np.linalg.norm(xyz - roi, axis=1) < radius

checks whether the distance for each tuple from roi is smaller than radius.

Finally, just reshape the result to the dimensions you need.

answered Sep 23, 2016 at 18:21

Ami Tavory

76.7k13 gold badges152 silver badges196 bronze badges

Collectives™ on Stack Overflow

Python vectorizing nested for loops

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related