Applying functions to multidimensional numpy arrays without loops

Question

I am working with raster data with numpy (after reading from GDAL), which represents elevation. My goal is calculate water flow direction for every pixel in the array using numpy, determined primarily from the difference in elevation between a given pixel and it's 8 neighbours.

I have already implemented a rolling window technique to generate a multidimensional array with each pixel and it's neighbours, which works as below:

def rolling_window(array, window_size):
    itemsize = array.itemsize
    shape = (array.shape[0] - window_size + 1,
             array.shape[1] - window_size + 1,
             window_size, window_size)
    strides = (array.shape[1] * itemsize, itemsize,
               array.shape[1] * itemsize, itemsize)
    return np.lib.stride_tricks.as_strided(array, shape=shape, strides=strides)

array = np.arange(100)
array = array.reshape(10, 10)
w = rolling_window(array, 3)

# produces array with shape (8, 8, 3, 3) - edge cases are not currently dealt with.

So, a series of 3 x 3 arrays, centred around the study pixel at 1,1, each within another dimension of the array for the raster "rows" e.g., from one pixel of the input, the array representing it could be as below, where the pixel valued 4 is the study pixel, and the other values are it's immediate neighbours.

array([[[[ 0,  1,  2],
         [ 3,  4,  5],
         [ 6,  7,  8]]]])

A simplified version of my current method for working with this multidimensional array is the following function:

def flow_dir(array):

    # Value to assign output based on element index.
    flow_idx_dict = {0: 32,
                     1: 64,
                     2: 128,
                     3: 16,
                     5: 1,
                     6: 8,
                     7: 4,
                     8: 2}

    # Generates the rolling window array as mentioned above.
    w = rolling_window(array, 3)

    # Iterate though each pixel array.
    for x, i in enumerate(w, 1):
        for y, j in enumerate(i, 1):
            j = j.flatten()

            # Centre pixel value after flattening.
            centre = j[4]

            # Some default values.
            idx = 4
            max_drop = 0

            # Iterate over pixel values in array.
            for count, px in enumerate(j):

                # Calculate difference between centre pixel and neighbour.
                drop = centre - px

                # Find the maximum difference pixel index.
                if count != 4:
                    if drop > max_drop:
                        max_drop = drop
                        idx = count

            # Assign a value from a dict, matching index to flow direction category.
            value = flow_idx_dict[idx]

            # Update each pixel in the input array with the flow direction.
            array[x, y] = value
    return array

Understandably, all these for loops and if statements are very slow. I know there must be a vectorized numpy way to do this, but I'm struggling to find the exact functions(s) I need, or perhaps have not understood how to implement them properly. I have tried np.apply_along_axis, np.where, np.nditer, and others, but to no avail so far. What I think I need is:

A way to apply a function to each of these pixel arrays produced by the rolling window without using for loops to access them.
Find the maximum drop index value, without using if statements and enumerate.
To be able to update the input array in a batch, not by individual element.

Could you share rolling_window function definition? Also, what's flow_idx_dict? Could you add sample inputs that could be used to run flow_dir? — Divakar
– Divakar, Commented Dec 17, 2015 at 13:57
I've added in the rolling_window and flow dict. The example of np.arange(100) reshaped to (10, 10) would suffice as an input to flow_dir, although in reality my arrays are much larger, and more variable in their values. — ssast
– ssast, Commented Dec 17, 2015 at 14:08
So, I would use arr = np.arange(90) and then flow_dir(arr)? That would throw error I think. — Divakar
– Divakar, Commented Dec 17, 2015 at 14:10
Figured as much. Take a look at np.pad to enable you to reflect the edge values to help take care of edge affects. So I assume, you only need to find the minimum difference (your window - the middle) to pull your value out of the dictionary, but it is not clear if you are using the cardinality solely or accounting for duplicate and perhaps opposing maximum drops. — user1121588
– user1121588, Commented Dec 17, 2015 at 16:27

B. M. · Accepted Answer · 2015-12-21 11:28:47Z

2

I think rolling windows can be avoided here; It is easier and more readable to vectorize on NxN array than NxNx3x3.

Consider this data :

array = np.array([[78, 72, 69, 71, 58, 49],
       [74, 67, 56, 49, 46, 50],
       [69, 53, 44, 37, 38, 48],
       [64, 58, 55, 22, 33, 24],
       [68, 61, 47, 21, 16, 19],
       [74, 53, 34, 12, 11, 12]])
N=6

First, compute the 8 gradients and codes this way :

gradient = np.empty((8,N-2,N-2),dtype=np.float)
code = np.empty(8,dtype=np.int)
for k in range(8):
    theta = -k*np.pi/4
    code[k] = 2**k
    j, i = np.int(1.5*np.cos(theta)),-np.int(1.5*np.sin(theta))
    d = np.linalg.norm([i,j])
    gradient[k] = (array[1+i: N-1+i,1+j: N-1+j]-array[1: N-1,1: N-1])/d

It is fast because there is few external loops (8). (-gradient).argmax(axis=0) give for each pixel the direction of the flow.

Finally, take the code :

direction = (-gradient).argmax(axis=0)
result = code.take(direction)

result :

array([[  2,   2,   4,   4],
       [  1,   2,   4,   8],
       [128,   1,   2,   4],
       [  2,   1,   4,   4]])

edited Dec 21, 2015 at 11:28

answered Dec 17, 2015 at 18:08

B. M.

18.7k2 gold badges40 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

ssast Over a year ago

Definitely would be glad to avoid the rolling window, just how my mind works with a spatial analysis background, rather than maths or computer science. I think this is very close to what I need, only the coding is not as expected e.g. given a window of [3, 2, 0], [1, 6, 9], [4, 4, 4], the output for the central pixel (6) is 4 (i.e. South), but should be 128 (NE), as with the direction coding here resources.arcgis.com/en/help/main/10.2/index.html#/…. Possibly the cause is the gradient array seems to have half its data set to zero.

ssast Over a year ago

I needed to make a few edits to run, but it's still not quite there - the diagonals are still coming out with zero gradient difference. For those, j, i are coming out as 0, 0, rather than -1, -1 etc. I also found that argmax is actually the correct function to use, as it is the maximum drop that is sought after. Ignoring the diagonals, this is working correctly now, so I think it's just the lines allocating i & j that need further work.

ssast Over a year ago

Hold on, you must be on Python 3. The problem was 3/2, which with my Python 2.7 install returns 1. Changing this to 3/2.0 fixes the issue.

ssast Over a year ago

Yes, I believe so, thank you for your time. With a test array of 1000 x 1000 cells, your code is approximately 100 times faster than the iterative rolling window method, definitely meeting my requirements. Now, I'll need to look at the edge cases and sinks...

Collectives™ on Stack Overflow

Applying functions to multidimensional numpy arrays without loops

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related