numpy get the blocks making an array in a matrix

Question

I have a numpy matrix such as the one below. What I'd like to do is get the arrays containing the blocks making each row/column here. How can this be efficiently done in numpy?

Examples

So if for example we have the array [1 1 1 1 0 1 1 1 1 0] (first row) then we would get [4 4] since we have 2 blocks of 4.

For the first column we would get [3 1] since we have at the start three 1-s, followed by a zero, then one 1 and then more zeros.

The mentioned matrix

[[1 1 1 1 0 1 1 1 1 0]
 [1 0 0 1 0 1 1 1 1 1]
 [1 0 1 0 1 0 0 1 0 1]
 [0 1 0 0 1 0 1 0 0 0]
 [1 1 1 1 1 1 0 1 0 1]
 [0 0 1 0 0 1 1 1 0 0]
 [0 0 0 0 1 1 0 1 1 0]
 [0 0 0 0 0 0 0 0 1 1]
 [0 1 0 1 0 1 0 0 0 0]
 [0 0 1 0 0 0 1 1 1 0]]

NOTE: rows are ordered from left to right, and columns are top to bottom.

What would row 2 be? [1, 1, 5]? And row 3: [1, 1, 1, 1, 1]? — s3dev
– s3dev, Commented Oct 4, 2020 at 20:05
@S3DEV and the same operation should be also applied to the columns (rows are ordered left to right, columns are top to bottom) — Ariel Yael
– Ariel Yael, Commented Oct 4, 2020 at 20:07
Ok. And what is the shape/data structure for the output, considering you’re looking for row and column results. An output for each, perhaps? — s3dev
– s3dev, Commented Oct 4, 2020 at 20:08
@S3DEV One array containing several arrays, each array is of the form described here (the arrays are ordered by their appearance in the matrix, meaning the array describing the first row is in position zero (first)). Same goes for the columns. — Ariel Yael
– Ariel Yael, Commented Oct 4, 2020 at 20:10

tenhjo · Accepted Answer · 2020-10-04 21:05:30Z

2

Here is some numpy magic:

a = np.array([[1, 1, 1, 1, 0, 1, 1, 1, 1, 0],
              [1, 0, 0, 1, 0, 1, 1, 1, 1, 1],
              [1, 0, 1, 0, 1, 0, 0, 1, 0, 1],
              [0, 1, 0, 0, 1, 0, 1, 0, 0, 0],
              [1, 1, 1, 1, 1, 1, 0, 1, 0, 1],
              [0, 0, 1, 0, 0, 1, 1, 1, 0, 0],
              [0, 0, 0, 0, 1, 1, 0, 1, 1, 0],
              [0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
              [0, 1, 0, 1, 0, 1, 0, 0, 0, 0],
              [0, 0, 1, 0, 0, 0, 1, 1, 1, 0]])


a_pad = np.zeros((a.shape[0]+2, a.shape[1]+2))
a_pad[1:-1, 1:-1] = a

cols = [np.diff(np.nonzero(c)[0].reshape(-1, 2), axis=1)[:, 0]
        for c in np.diff(a_pad, axis=0).T[1:-1]]
# [array([3, 1]),  array([1, 2, 1]),  array([1, 1, 2, 1]), ...

rows = [np.diff(np.nonzero(r)[0].reshape(-1, 2), axis=1)[:, 0]
        for r in np.diff(a_pad, axis=1)[1:-1]]
# [array([4, 4]),  array([1, 1, 5]),  array([1, 1, 1, 1, 1]), ...

Now let's explore what happens on an example array (a[5, :]):

# a            [0, 0, 1,  0, 0, 0, 1, 1,  1, 0,]
# pad       [0, 0, 0, 1,  0, 0, 0, 1, 1,  1, 0, 0]
# diff()       [0, 0, 1, -1, 0, 0, 1, 0, 0, -1, 0]
#                     ^   ^        ^         ^
# nonzero()          [2,  3,       6,        9]
# reshape() [[2, 3],
#            [6, 9]]
# diff()     [1, 3]

The idea is that when padding the binary array with zeros at both ends one can find the start and end of each sequence of ones easily by applying np.diff() (1 where 0->1 and -1 where 1->0). Therefore np.nonzero(np.diff()) gives the indices of the start and end points of each sequence. Additionally we know that starts (+1) and ends (-1) must always alternate. So np.reshape(-1, 2) gives us the start points in the first column and the end points in the second. Applying np.diff() again on this array gives the length of each sequence.

edited Oct 4, 2020 at 21:05

answered Oct 4, 2020 at 20:46

tenhjo

4,5911 gold badge17 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ariel Yael Over a year ago

Seems like this works great! Would you mind explaining the process here?

mathfux Over a year ago

Same method in my solution but different arrangement of similar methods ;)

mathfux · Accepted Answer · 2020-10-04 21:07:24Z

1

This is a way to do this row-wise:

def get_blocks(a):
    x = np.diff(a,prepend=0,append=0)
    groups, starts = np.nonzero(x*x+x)
    groups, ends = np.nonzero(x*x-x)
    values = ends - starts
    markers = np.diff(groups, prepend=0)
    marker_idx = np.nonzero(marker_idx)[0]
    return np.split(values, marker_idx)

Sample run:

>>> a = np.array([[1, 1, 1, 1, 0, 1, 1, 1, 1, 0],
   [1, 0, 0, 1, 0, 1, 1, 1, 1, 1],
   [1, 0, 1, 0, 1, 0, 0, 1, 0, 1],
   [0, 1, 0, 0, 1, 0, 1, 0, 0, 0],
   [1, 1, 1, 1, 1, 1, 0, 1, 0, 1],
   [0, 0, 1, 0, 0, 1, 1, 1, 0, 0],
   [0, 0, 0, 0, 1, 1, 0, 1, 1, 0],
   [0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
   [0, 1, 0, 1, 0, 1, 0, 0, 0, 0],
   [0, 0, 1, 0, 0, 0, 1, 1, 1, 0]])
>>> get_blocks(a)
[array([4, 4], dtype=int32), 
array([1, 1, 5], dtype=int32), 
array([1, 1, 1, 1, 1], dtype=int32), 
array([1, 1, 1], dtype=int32), 
array([6, 1, 1], dtype=int32), 
array([1, 3], dtype=int32), 
array([2, 2], dtype=int32), 
array([2], dtype=int32), 
array([1, 1, 1], dtype=int32), 
array([1, 3], dtype=int32)]

It works for columns as well if you call it on A.T

edited Oct 4, 2020 at 21:07

answered Oct 4, 2020 at 20:52

mathfux

5,9792 gold badges20 silver badges38 bronze badges

2 Comments

tenhjo Over a year ago

np.diff() takes a perpend and an append argument?! Man I should have a look a the documentation more often.

mathfux Over a year ago

@sclaronomic Yes, it appears to be a new feature since Jan of 2019 when numpy 1.16.0 was released.

Collectives™ on Stack Overflow

numpy get the blocks making an array in a matrix

Examples

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Examples

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related