How do I extract a 2D NumPy sub-array from a 2D NumPy array based on patterns?

Question

I have a 2D NumPy array which looks like this:


Array=
[
[0,0,0,0,0,0,0,2,2,2],
[0,0,0,0,0,0,0,2,2,2].
[0,0,1,1,1,0,0,2,2,2],
[0,0,1,1,1,0,0,2,2,2],
[0,0,1,1,1,0,0,1,1,1],
[0,0,0,0,0,0,0,1,1,1]
]

I need to display the arrays of non-zero elements as:

Array1:
[
[1,1,1],
[1,1,1],
[1,1,1]
]

Array2:
[
[2,2,2],
[2,2,2],
[2,2,2],
[2,2,2]
]

Array3:
[
[1,1,1],
[1,1,1]
]

Could someone please help me out with what logic I could use to achieve the following output? I can't use fixed indexes (like array[a:b, c:d]) since the logic i create should be able to work for any NumPy array with a similar pattern.

i don't think there is a built-in function for that... You have to write your own logic to loop over the numpy array — emsiiggy
– emsiiggy, Commented Nov 27, 2019 at 18:18
Yeah i figured. I have been trying to figure out the loops for hours now but I couldn't get it. Do you have any idea how it might be possible? — MajesticOverlord
– MajesticOverlord, Commented Nov 27, 2019 at 18:31
What output do you expect for np.array([[0,1,1,0], [1,1,1,1]])? Is that even a potential input? Or are all sub-arrays rectangular and well separated? — mcsoini
– mcsoini, Commented Nov 27, 2019 at 18:44
@mcsoini No that is not a potential input. All inputs are of the form np.array([[0,0,1,1,1,0,0], [0,0,1,1,1,0,0]) . The sub-array's are all rectangular non-zero whole numbers and well separated with the rest of array being 0. — MajesticOverlord
– MajesticOverlord, Commented Nov 27, 2019 at 18:58

mcsoini · Accepted Answer · 2019-11-27 19:19:36Z

This uses scipy.ndimage.label to recursively identify disconnected sub-arrays.

import numpy as np
from scipy.ndimage import label

array = np.array(
    [[0,0,0,0,0,0,0,2,2,2,3,3,3],
     [0,0,0,0,0,0,0,2,2,2,0,0,1],
     [0,0,1,1,1,0,0,2,2,2,0,2,1],
     [0,0,1,1,1,0,0,2,2,2,0,2,0],
     [0,0,1,1,1,0,0,1,1,1,0,0,0],
     [0,0,0,0,0,0,0,1,1,1,0,0,0]])

# initialize list to collect sub-arrays
arr_list = []

def append_subarrays(arr, val, val_0):
    '''
    arr : 2D array
    val : the value used for filtering
    val_0 : the original value, which we want to preserve
    '''

    # remove everything that's not the current val
    arr[arr != val] = 0
    if 0 in arr:  # <-- not a single rectangle yet
        # get relevant indices as well as their minima and maxima
        x_ind, y_ind = np.where(arr != 0)
        min_x, max_x, min_y, max_y = min(x_ind), max(x_ind) + 1, min(y_ind), max(y_ind) + 1
        # cut subarray (everything corresponding to val)
        arr = arr[min_x:max_x, min_y:max_y]
        # use the label function to assign different values to disconnected regions
        labeled_arr = label(arr)[0]
        # recursively apply append_subarrays to each disconnected region 
        for sub_val in np.unique(labeled_arr[labeled_arr != 0]):
            append_subarrays(labeled_arr.copy(), sub_val, val_0)

    else:  # <-- we only have a single rectangle left ==> append
        arr_list.append(arr * val_0)

for i in np.unique(array[array > 0]):
    append_subarrays(array.copy(), i, i)

for arr in arr_list:
    print(arr, end='\n'*2)

Output (note: modified example array):

[[1]
 [1]]

[[1 1 1]
 [1 1 1]
 [1 1 1]]

[[1 1 1]
 [1 1 1]]

[[2 2 2]
 [2 2 2]
 [2 2 2]
 [2 2 2]]

[[2]
 [2]]

[[3 3 3]]

I actually read your uncommented code (while on the trone...). Nice to see that you still commented it afterwards :) (and it helps (me the least))

Quang Hoang · Accepted Answer · 2019-11-27 19:20:40Z

2

This sounds like a floodfill problem, so skimage.measure.label is a good approach:

Array=np.array([[0,0,0,0,0,0,0,2,2,2],
                [0,0,0,0,0,0,0,2,2,2],
                [0,0,1,1,1,0,0,2,2,2],
                [0,0,1,1,1,0,0,2,2,2],
                [0,0,1,1,1,0,0,1,1,1],
                [0,0,0,0,0,0,0,1,1,1]
                ])

from skimage.measure import label
labels = label(Array, connectivity=1)

for label in range(1, labels.max()+1):
    xs, ys = np.where(labels==label)
    shape = (len(np.unique(xs)), len(np.unique(ys)))

    print(Array[xs, ys].reshape(shape))

Output:

[[2 2 2]
 [2 2 2]
 [2 2 2]
 [2 2 2]]
[[1 1 1]
 [1 1 1]
 [1 1 1]]
[[1 1 1]
 [1 1 1]]

answered Nov 27, 2019 at 19:20

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

1 Comment

mcsoini Over a year ago

shucks, I was looking for a labeling function which takes into account values. scipy.ndimage just does flat features. Then I wouldn't have needed a recursion. Well done!

emsiiggy · Accepted Answer · 2019-11-27 19:21:39Z

startRowIndex = 0 #indexes of sub-arrays
endRowIndex = 0
startColumnIndex = 0
endColumnIndex = 0

tmpI = 0 #for iterating inside the i,j loops
tmpJ = 0
value = 0 #which number we are looking for in array
for i in range(array.shape[0]): #array.shape[0] says how many rows, shape[1] says how many columns
    for j in range(array[i].size): #for all elements in a row
        if(array[i,j] != 0): #if the element is different than 0
            startRowIndex = i
            startColumnIndex = j
            tmpI = i
            tmpJ = j #you cannot change the looping indexes so create tmp indexes
            value = array[i,j] #save what number will be sub-array (for example 2)
            while(array[tmpI,tmpJ] != 0 and array[tmpI,tmpJ] == value ): #iterate over column numbers
                tmpJ+=1
                if tmpJ == array.shape[1]: #if you reached end of the array (that is end of the column)
                    break

            #if you left the array then it means you are on index which is not zero,
            #so the previous where zero, but displaying array like this a[start:stop]
            #will take the values from <start; stop) (stop is excluded)
            endColumnIndex = tmpJ 
            tmpI = i
            tmpJ = j

            while(array[tmpI,tmpJ] != 0 and array[tmpI,tmpJ] == value): #iterate over row numbers
                tmpI += 1
                if tmpI == array.shape[0]: #if you reached end of the array
                    break
            #if you left the array then it means you are on index which is not zero,
            #so the previous where zero 
            endRowIndex = tmpI
            print(array[startRowIndex:endRowIndex, startColumnIndex:endColumnIndex])
            #change array to zero with already used elements
            array[startRowIndex:endRowIndex, startColumnIndex:endColumnIndex] = 0

This one is kinda brute-force but works the way you want it. This approach doesn't use any external library other than numpy

kaya3 · Accepted Answer · 2020-01-19 16:04:06Z

Here's my pure Python (no NumPy) solution. I took advantage of the fact that the contiguous regions are always rectangular.

The algorithm scans from top-left to bottom-right; when it finds the corner of a region, it scans to find the top-right and bottom-left corners. The dictionary skip is populated so that later scans can skip horizontally past any rectangle which has already been found.

The time complexity is O(nm) for a grid with n rows and m columns, which is optimal for this problem.

def find_rectangles(grid):
    width, height = len(grid[0]), len(grid)

    skip = dict()

    for y in range(height):
        x = 0
        while x < width:
            if (x, y) in skip:
                x = skip[x, y]
            elif not grid[y][x]:
                x += 1
            else:
                v = grid[y][x]

                x2 = x + 1
                while x2 < width and grid[y][x2] == v:
                    x2 += 1

                y2 = y + 1
                while y2 < height and grid[y2][x] == v:
                    skip[x, y2] = x2
                    y2 += 1

                yield [ row[x:x2] for row in grid[y:y2] ]

                x = x2

Example:

>>> for r in find_rectangles(grid1): # example from the question
...     print(r)
...
[[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]]
[[1, 1, 1], [1, 1, 1], [1, 1, 1]]
[[1, 1, 1], [1, 1, 1]]
>>> for r in find_rectangles(grid2): # example from mcsoini's answer
...     print(r)
...
[[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]]
[[3, 3, 3]]
[[1], [1]]
[[1, 1, 1], [1, 1, 1], [1, 1, 1]]
[[2], [2]]
[[1, 1, 1], [1, 1, 1]]

Paul Panzer · Accepted Answer · 2019-11-27 23:35:59Z

1

We can do this using scipy.ndimage.label and scipy.ndimage.find_objects:

from scipy.ndimage import label,find_objects
Array = np.array(Array)
[Array[j][i] for j in find_objects(*label(Array)) for i in find_objects(Array[j])]
# [array([[1, 1, 1],
#        [1, 1, 1]]), array([[2, 2, 2],
#        [2, 2, 2],
#        [2, 2, 2],
#        [2, 2, 2]]), array([[1, 1, 1],
#        [1, 1, 1],
#        [1, 1, 1]])]

edited Nov 27, 2019 at 23:35

answered Nov 27, 2019 at 19:50

Paul Panzer

53.3k3 gold badges60 silver badges103 bronze badges

8 Comments

mcsoini Over a year ago

I think there is an issue with this... try with a more complex array, like the example from my answer

Paul Panzer Over a year ago

@mcsoini Whether or not OP is interested in exmaples such as yours is not clear to me. This code works assuming that we want connected nonzeros grouped by actual value and that the values form a range starting at 1.

mcsoini Over a year ago

sorry, didn't mean to criticize. just noticed that it breaks when more connected groups are added

Paul Panzer Over a year ago

@mcsoini no worries. I checked on your example. It does what it is supposed to do: cut out the smallest rectangle containing all occurrences of a given label in given connected component.

Paul Panzer Over a year ago

@mcsoini My point is as long as it is not clear whether OP expects this kind of pattern (multiple groups of a given label not separated by zeros but by other nonzeros) or what their expected outcome in such cases would be I prefer to keep it simple and cheap.

|

Collectives™ on Stack Overflow

How do I extract a 2D NumPy sub-array from a 2D NumPy array based on patterns?

5 Answers 5

1 Comment

1 Comment

Comments

Comments

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

1 Comment

Comments

Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related