alternative to numpy.argwhere to speed up for loop in python

Question

I have two data-sets as following:

ds1: a DEM (digital elevation model) file as 2d numpy array and,

ds2: which is showing areas (pixels) with some excess water in them.

I have a while loop that is responsible to spread (and change) excess volume in each pixel according to the elevation of its 8 neighbours and itself until the excess volume in each pixel is less than a certain value d = 0.05. Therefore in each iteration I need to find the index of pixels in ds2 where excess volume is greater than 0.05 and if there is no pixel left, exit the while loop:

exit_code == "No"
while exit_code == "No":
    index_of_pixels_with_excess_volume = numpy.argwhere(ds2> 0.05) # find location of pixels where excess volume is greater than 0.05

    if not index_of_pixels_with_excess_volume.size:
        exit_code = "Yes"
    else:
        for pixel in index_of_pixels_with_excess_volume:
            # spread those excess volumes to the neighbours and
            # change the values of ds2

the problem is that numpy.argwhere(ds2> 0.05) is very slow. I am looking for an alternative solution that is faster.

argwhere is just where with a transpose to turn the tuple of arrays into a 2d array. — hpaulj
– hpaulj, Commented Nov 2, 2017 at 5:34
Are you sure it's the argwhere, and not the ds2>0.05 step, or more likely the iteration on all those pixel? — hpaulj
– hpaulj, Commented Nov 2, 2017 at 5:36
I did a cProfile and conclude this based on the number of the cumulative seconds written besides argwhere. — Behzad Jamali
– Behzad Jamali, Commented Nov 2, 2017 at 5:48
The where expression has to iterate over the array several time. One to create the boolean array. Then np.count_nonzero quickly counts the number of True values. Finally where (actually np.nonzero) collects the indices of those values. The transpose to argwhere should be a minor part of the action. If ds2 large compared to the number of True pixels, then this where action could dominate. — hpaulj
– hpaulj, Commented Nov 2, 2017 at 6:15
This is a very good idea. What worries me is the time that needs to be spent on modifying two arrays (ds2 and boolean) in each iteration instead of one array and hence loosing performance? As a side question: how about using scipy sparse matrices instead of booleans? — Behzad Jamali
– Behzad Jamali, Commented Nov 2, 2017 at 6:29

hpaulj · Accepted Answer · 2017-11-02 07:02:32Z

Make a sample 2d array:

In [584]: arr = np.random.rand(1000,1000)

Find a small proportion of them:

In [587]: np.where(arr>.999)
Out[587]: 
(array([  1,   1,   1, ..., 997, 999, 999], dtype=int32),
 array([273, 471, 584, ..., 745, 310, 679], dtype=int32))
In [588]: _[0].shape
Out[588]: (1034,)

Time various pieces of argwhere:

In [589]: timeit arr>.999
2.65 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [590]: timeit np.count_nonzero(arr>.999)
2.79 ms ± 26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [591]: timeit np.nonzero(arr>.999)
6 ms ± 10 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [592]: timeit np.argwhere(arr>.999)
6.06 ms ± 58.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So about 1/3 of the time is spend doing the > test, and the rest in finding the True elements. Turning the where tuple into a 2 column array is fast.

Now if the goal was to just find the first > value, argmax is fast.

In [593]: np.argmax(arr>.999)
Out[593]: 1273    # can unravel this to (1,273)
In [594]: timeit np.argmax(arr>.999)
2.76 ms ± 143 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

argmax short circuits, so the actual run time will vary on when it finds the first value.

flatnonzero is faster than where:

In [595]: np.flatnonzero(arr>.999)
Out[595]: array([  1273,   1471,   1584, ..., 997745, 999310, 999679], dtype=int32)
In [596]: timeit np.flatnonzero(arr>.999)
3.05 ms ± 26.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [599]: np.unravel_index(np.flatnonzero(arr>.999),arr.shape)
Out[599]: 
(array([  1,   1,   1, ..., 997, 999, 999], dtype=int32),
 array([273, 471, 584, ..., 745, 310, 679], dtype=int32))
In [600]: timeit np.unravel_index(np.flatnonzero(arr>.999),arr.shape)
3.05 ms ± 3.58 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [601]: timeit np.transpose(np.unravel_index(np.flatnonzero(arr>.999),arr.shap
     ...: e))
3.1 ms ± 5.86 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This is the same as np.argwhere(arr>.999).

Interesting, the flatnonzero approach cuts the time in half! I didn't expect such a big improvement.

Comparing the iteration speeds:

Iteration on the 2d array from argwhere:

In [607]: pixels = np.argwhere(arr>.999)
In [608]: timeit [pixel for pixel in pixels]
347 µs ± 5.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Iterating on the tuple from where with the zip(*) transpose:

In [609]: idx = np.where(arr>.999)
In [610]: timeit [pixel for pixel in zip(*idx)]
256 µs ± 147 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Iterating on an array is often a little slower than iterating on a list, or in this case zipped arrays.

In [611]: [pixel for pixel in pixels][:5]
Out[611]: 
[array([  1, 273], dtype=int32),
 array([  1, 471], dtype=int32),
 array([  1, 584], dtype=int32),
 array([  1, 826], dtype=int32),
 array([  2, 169], dtype=int32)]
In [612]: [pixel for pixel in zip(*idx)][:5]
Out[612]: [(1, 273), (1, 471), (1, 584), (1, 826), (2, 169)]

One is a list of arrays, the other a list of tuples. But turning those tuples into arrays (individually) is slow:

In [614]: timeit [np.array(pixel) for pixel in zip(*idx)]
2.26 ms ± 4.94 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Iterating on the flat nonzero array is faster

In [617]: fdx = np.flatnonzero(arr>.999)
In [618]: fdx[:5]
Out[618]: array([1273, 1471, 1584, 1826, 2169], dtype=int32)
In [619]: timeit [i for i in fdx]
112 µs ± 23.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

but applying unravel to those values individually will take time.

def foo(idx):    # a simplified unravel
    return idx//1000, idx%1000

In [628]: timeit [foo(i) for i in fdx]
1.12 ms ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Add this 1 ms to the 3 ms to generate fdx, this flatnonzero might still come out ahead. But at its best we are talking about a 2x speed improvement.

I promise to upvote this in 17 hours (reached my limit for the day) (:
For 1D case (flattened arrays) I was expecting that further speed-up could have been achieved by pre-allocating the index array such as r=np.arange(ds2.size) and then simply applying boolean index to this array: r[ds2.ravel() > threshold]. This being in the loop could result in some gain. Strangely, this approach is slower than flatnonzero(ds2 > threshold). Any ideas why?
plus1 for perseverance and hard work and a better answer than mine!

AGN Gazer · Accepted Answer · 2017-11-02 06:09:53Z

3

np.where(arr> 0.05) and (arr > 0.05).nonzero() are ~22-25% faster in my tests.

For example:

while exit_code == "No":
    index_of_pixels_with_excess_volume = numpy.where(ds2 > 0.05)

    if not index_of_pixels_with_excess_volume[0].size:
        exit_code = "Yes"
    else:
        for pixel in zip(*index_of_pixels_with_excess_volume):

However, I worry that any gain brought by where vs. argwhere will be lost in this last loop due to zip(*...). Just let me know if that is the case and I will happily delete this answer.

edited Nov 2, 2017 at 6:09

answered Nov 2, 2017 at 5:35

AGN Gazer

8,4272 gold badges31 silver badges49 bronze badges

12 Comments

Behzad Jamali Over a year ago

How np.flatnonzero() can be used in a 2D array and what if we are interested in values that are bigger than a number other than zero?

AGN Gazer Over a year ago

@BehzadJamali flatnonzero is not used directly on your data but rather on the result of the comparison (arr > 0.05). For 2D data use nonzero() which is pretty much the same as where().

AGN Gazer Over a year ago

@BehzadJamali Oh, I just re-read your question and I see that you mention you are dealing with 2D arrays. Editing my answer...

Behzad Jamali Over a year ago

I'm checking the solution.

AGN Gazer Over a year ago

@BehzadJamali I added an example that also show that you will need to use zip(*...) in your inner loop which could kill any performance gains.

|

Behzad Jamali · Accepted Answer · 2017-11-03 06:00:41Z

0

Another solution to my question for those who might be interested: I found "Code Vectorization" by using numpy tricks would significantly speed-up the run time by eliminating for or while loops and numpy.where(). I found these two websites very helpful in explaining code vectorization.

https://github.com/rougier/from-python-to-numpy/blob/master/04-code-vectorization.rst#id36

https://www.labri.fr/perso/nrougier/teaching/numpy/numpy.html

answered Nov 3, 2017 at 6:00

Behzad Jamali

9242 gold badges10 silver badges25 bronze badges

Collectives™ on Stack Overflow

alternative to numpy.argwhere to speed up for loop in python

3 Answers 3

3 Comments

12 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

12 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related