2

I have a 2D NumPy array which looks like this:


Array=
[
[0,0,0,0,0,0,0,2,2,2],
[0,0,0,0,0,0,0,2,2,2].
[0,0,1,1,1,0,0,2,2,2],
[0,0,1,1,1,0,0,2,2,2],
[0,0,1,1,1,0,0,1,1,1],
[0,0,0,0,0,0,0,1,1,1]
]

I need to display the arrays of non-zero elements as:

Array1:
[
[1,1,1],
[1,1,1],
[1,1,1]
]

Array2:
[
[2,2,2],
[2,2,2],
[2,2,2],
[2,2,2]
]

Array3:
[
[1,1,1],
[1,1,1]
]

Could someone please help me out with what logic I could use to achieve the following output? I can't use fixed indexes (like array[a:b, c:d]) since the logic i create should be able to work for any NumPy array with a similar pattern.

4
  • 1
    i don't think there is a built-in function for that... You have to write your own logic to loop over the numpy array Commented Nov 27, 2019 at 18:18
  • Yeah i figured. I have been trying to figure out the loops for hours now but I couldn't get it. Do you have any idea how it might be possible? Commented Nov 27, 2019 at 18:31
  • What output do you expect for np.array([[0,1,1,0], [1,1,1,1]])? Is that even a potential input? Or are all sub-arrays rectangular and well separated? Commented Nov 27, 2019 at 18:44
  • @mcsoini No that is not a potential input. All inputs are of the form np.array([[0,0,1,1,1,0,0], [0,0,1,1,1,0,0]) . The sub-array's are all rectangular non-zero whole numbers and well separated with the rest of array being 0. Commented Nov 27, 2019 at 18:58

5 Answers 5

2

This uses scipy.ndimage.label to recursively identify disconnected sub-arrays.

import numpy as np
from scipy.ndimage import label

array = np.array(
    [[0,0,0,0,0,0,0,2,2,2,3,3,3],
     [0,0,0,0,0,0,0,2,2,2,0,0,1],
     [0,0,1,1,1,0,0,2,2,2,0,2,1],
     [0,0,1,1,1,0,0,2,2,2,0,2,0],
     [0,0,1,1,1,0,0,1,1,1,0,0,0],
     [0,0,0,0,0,0,0,1,1,1,0,0,0]])
# initialize list to collect sub-arrays
arr_list = []

def append_subarrays(arr, val, val_0):
    '''
    arr : 2D array
    val : the value used for filtering
    val_0 : the original value, which we want to preserve
    '''

    # remove everything that's not the current val
    arr[arr != val] = 0
    if 0 in arr:  # <-- not a single rectangle yet
        # get relevant indices as well as their minima and maxima
        x_ind, y_ind = np.where(arr != 0)
        min_x, max_x, min_y, max_y = min(x_ind), max(x_ind) + 1, min(y_ind), max(y_ind) + 1
        # cut subarray (everything corresponding to val)
        arr = arr[min_x:max_x, min_y:max_y]
        # use the label function to assign different values to disconnected regions
        labeled_arr = label(arr)[0]
        # recursively apply append_subarrays to each disconnected region 
        for sub_val in np.unique(labeled_arr[labeled_arr != 0]):
            append_subarrays(labeled_arr.copy(), sub_val, val_0)

    else:  # <-- we only have a single rectangle left ==> append
        arr_list.append(arr * val_0)

for i in np.unique(array[array > 0]):
    append_subarrays(array.copy(), i, i)

for arr in arr_list:
    print(arr, end='\n'*2)

Output (note: modified example array):

[[1]
 [1]]

[[1 1 1]
 [1 1 1]
 [1 1 1]]

[[1 1 1]
 [1 1 1]]

[[2 2 2]
 [2 2 2]
 [2 2 2]
 [2 2 2]]

[[2]
 [2]]

[[3 3 3]]

Sign up to request clarification or add additional context in comments.

1 Comment

I actually read your uncommented code (while on the trone...). Nice to see that you still commented it afterwards :) (and it helps (me the least))
2

This sounds like a floodfill problem, so skimage.measure.label is a good approach:

Array=np.array([[0,0,0,0,0,0,0,2,2,2],
                [0,0,0,0,0,0,0,2,2,2],
                [0,0,1,1,1,0,0,2,2,2],
                [0,0,1,1,1,0,0,2,2,2],
                [0,0,1,1,1,0,0,1,1,1],
                [0,0,0,0,0,0,0,1,1,1]
                ])

from skimage.measure import label
labels = label(Array, connectivity=1)

for label in range(1, labels.max()+1):
    xs, ys = np.where(labels==label)
    shape = (len(np.unique(xs)), len(np.unique(ys)))

    print(Array[xs, ys].reshape(shape))

Output:

[[2 2 2]
 [2 2 2]
 [2 2 2]
 [2 2 2]]
[[1 1 1]
 [1 1 1]
 [1 1 1]]
[[1 1 1]
 [1 1 1]]

1 Comment

shucks, I was looking for a labeling function which takes into account values. scipy.ndimage just does flat features. Then I wouldn't have needed a recursion. Well done!
2
startRowIndex = 0 #indexes of sub-arrays
endRowIndex = 0
startColumnIndex = 0
endColumnIndex = 0

tmpI = 0 #for iterating inside the i,j loops
tmpJ = 0
value = 0 #which number we are looking for in array
for i in range(array.shape[0]): #array.shape[0] says how many rows, shape[1] says how many columns
    for j in range(array[i].size): #for all elements in a row
        if(array[i,j] != 0): #if the element is different than 0
            startRowIndex = i
            startColumnIndex = j
            tmpI = i
            tmpJ = j #you cannot change the looping indexes so create tmp indexes
            value = array[i,j] #save what number will be sub-array (for example 2)
            while(array[tmpI,tmpJ] != 0 and array[tmpI,tmpJ] == value ): #iterate over column numbers
                tmpJ+=1
                if tmpJ == array.shape[1]: #if you reached end of the array (that is end of the column)
                    break

            #if you left the array then it means you are on index which is not zero,
            #so the previous where zero, but displaying array like this a[start:stop]
            #will take the values from <start; stop) (stop is excluded)
            endColumnIndex = tmpJ 
            tmpI = i
            tmpJ = j

            while(array[tmpI,tmpJ] != 0 and array[tmpI,tmpJ] == value): #iterate over row numbers
                tmpI += 1
                if tmpI == array.shape[0]: #if you reached end of the array
                    break
            #if you left the array then it means you are on index which is not zero,
            #so the previous where zero 
            endRowIndex = tmpI
            print(array[startRowIndex:endRowIndex, startColumnIndex:endColumnIndex])
            #change array to zero with already used elements
            array[startRowIndex:endRowIndex, startColumnIndex:endColumnIndex] = 0

This one is kinda brute-force but works the way you want it. This approach doesn't use any external library other than numpy

Comments

2

Here's my pure Python (no NumPy) solution. I took advantage of the fact that the contiguous regions are always rectangular.

The algorithm scans from top-left to bottom-right; when it finds the corner of a region, it scans to find the top-right and bottom-left corners. The dictionary skip is populated so that later scans can skip horizontally past any rectangle which has already been found.

The time complexity is O(nm) for a grid with n rows and m columns, which is optimal for this problem.

def find_rectangles(grid):
    width, height = len(grid[0]), len(grid)

    skip = dict()

    for y in range(height):
        x = 0
        while x < width:
            if (x, y) in skip:
                x = skip[x, y]
            elif not grid[y][x]:
                x += 1
            else:
                v = grid[y][x]

                x2 = x + 1
                while x2 < width and grid[y][x2] == v:
                    x2 += 1

                y2 = y + 1
                while y2 < height and grid[y2][x] == v:
                    skip[x, y2] = x2
                    y2 += 1

                yield [ row[x:x2] for row in grid[y:y2] ]

                x = x2

Example:

>>> for r in find_rectangles(grid1): # example from the question
...     print(r)
...
[[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]]
[[1, 1, 1], [1, 1, 1], [1, 1, 1]]
[[1, 1, 1], [1, 1, 1]]
>>> for r in find_rectangles(grid2): # example from mcsoini's answer
...     print(r)
...
[[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]]
[[3, 3, 3]]
[[1], [1]]
[[1, 1, 1], [1, 1, 1], [1, 1, 1]]
[[2], [2]]
[[1, 1, 1], [1, 1, 1]]

Comments

1

We can do this using scipy.ndimage.label and scipy.ndimage.find_objects:

from scipy.ndimage import label,find_objects
Array = np.array(Array)
[Array[j][i] for j in find_objects(*label(Array)) for i in find_objects(Array[j])]
# [array([[1, 1, 1],
#        [1, 1, 1]]), array([[2, 2, 2],
#        [2, 2, 2],
#        [2, 2, 2],
#        [2, 2, 2]]), array([[1, 1, 1],
#        [1, 1, 1],
#        [1, 1, 1]])]

8 Comments

I think there is an issue with this... try with a more complex array, like the example from my answer
@mcsoini Whether or not OP is interested in exmaples such as yours is not clear to me. This code works assuming that we want connected nonzeros grouped by actual value and that the values form a range starting at 1.
sorry, didn't mean to criticize. just noticed that it breaks when more connected groups are added
@mcsoini no worries. I checked on your example. It does what it is supposed to do: cut out the smallest rectangle containing all occurrences of a given label in given connected component.
@mcsoini My point is as long as it is not clear whether OP expects this kind of pattern (multiple groups of a given label not separated by zeros but by other nonzeros) or what their expected outcome in such cases would be I prefer to keep it simple and cheap.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.