4

I have a numpy matrix such as the one below. What I'd like to do is get the arrays containing the blocks making each row/column here. How can this be efficiently done in numpy?

Examples

So if for example we have the array [1 1 1 1 0 1 1 1 1 0] (first row) then we would get [4 4] since we have 2 blocks of 4.

For the first column we would get [3 1] since we have at the start three 1-s, followed by a zero, then one 1 and then more zeros.

The mentioned matrix

[[1 1 1 1 0 1 1 1 1 0]
 [1 0 0 1 0 1 1 1 1 1]
 [1 0 1 0 1 0 0 1 0 1]
 [0 1 0 0 1 0 1 0 0 0]
 [1 1 1 1 1 1 0 1 0 1]
 [0 0 1 0 0 1 1 1 0 0]
 [0 0 0 0 1 1 0 1 1 0]
 [0 0 0 0 0 0 0 0 1 1]
 [0 1 0 1 0 1 0 0 0 0]
 [0 0 1 0 0 0 1 1 1 0]]

NOTE: rows are ordered from left to right, and columns are top to bottom.

5
  • What would row 2 be? [1, 1, 5]? And row 3: [1, 1, 1, 1, 1]? Commented Oct 4, 2020 at 20:05
  • @S3DEV yes, exactly Commented Oct 4, 2020 at 20:06
  • @S3DEV and the same operation should be also applied to the columns (rows are ordered left to right, columns are top to bottom) Commented Oct 4, 2020 at 20:07
  • Ok. And what is the shape/data structure for the output, considering you’re looking for row and column results. An output for each, perhaps? Commented Oct 4, 2020 at 20:08
  • @S3DEV One array containing several arrays, each array is of the form described here (the arrays are ordered by their appearance in the matrix, meaning the array describing the first row is in position zero (first)). Same goes for the columns. Commented Oct 4, 2020 at 20:10

2 Answers 2

2

Here is some numpy magic:

a = np.array([[1, 1, 1, 1, 0, 1, 1, 1, 1, 0],
              [1, 0, 0, 1, 0, 1, 1, 1, 1, 1],
              [1, 0, 1, 0, 1, 0, 0, 1, 0, 1],
              [0, 1, 0, 0, 1, 0, 1, 0, 0, 0],
              [1, 1, 1, 1, 1, 1, 0, 1, 0, 1],
              [0, 0, 1, 0, 0, 1, 1, 1, 0, 0],
              [0, 0, 0, 0, 1, 1, 0, 1, 1, 0],
              [0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
              [0, 1, 0, 1, 0, 1, 0, 0, 0, 0],
              [0, 0, 1, 0, 0, 0, 1, 1, 1, 0]])


a_pad = np.zeros((a.shape[0]+2, a.shape[1]+2))
a_pad[1:-1, 1:-1] = a

cols = [np.diff(np.nonzero(c)[0].reshape(-1, 2), axis=1)[:, 0]
        for c in np.diff(a_pad, axis=0).T[1:-1]]
# [array([3, 1]),  array([1, 2, 1]),  array([1, 1, 2, 1]), ...

rows = [np.diff(np.nonzero(r)[0].reshape(-1, 2), axis=1)[:, 0]
        for r in np.diff(a_pad, axis=1)[1:-1]]
# [array([4, 4]),  array([1, 1, 5]),  array([1, 1, 1, 1, 1]), ...

Now let's explore what happens on an example array (a[5, :]):

# a            [0, 0, 1,  0, 0, 0, 1, 1,  1, 0,]
# pad       [0, 0, 0, 1,  0, 0, 0, 1, 1,  1, 0, 0]
# diff()       [0, 0, 1, -1, 0, 0, 1, 0, 0, -1, 0]
#                     ^   ^        ^         ^
# nonzero()          [2,  3,       6,        9]
# reshape() [[2, 3],
#            [6, 9]]
# diff()     [1, 3]

The idea is that when padding the binary array with zeros at both ends one can find the start and end of each sequence of ones easily by applying np.diff() (1 where 0->1 and -1 where 1->0). Therefore np.nonzero(np.diff()) gives the indices of the start and end points of each sequence. Additionally we know that starts (+1) and ends (-1) must always alternate. So np.reshape(-1, 2) gives us the start points in the first column and the end points in the second. Applying np.diff() again on this array gives the length of each sequence.

Sign up to request clarification or add additional context in comments.

2 Comments

Seems like this works great! Would you mind explaining the process here?
Same method in my solution but different arrangement of similar methods ;)
1

This is a way to do this row-wise:

def get_blocks(a):
    x = np.diff(a,prepend=0,append=0)
    groups, starts = np.nonzero(x*x+x)
    groups, ends = np.nonzero(x*x-x)
    values = ends - starts
    markers = np.diff(groups, prepend=0)
    marker_idx = np.nonzero(marker_idx)[0]
    return np.split(values, marker_idx)

Sample run:

>>> a = np.array([[1, 1, 1, 1, 0, 1, 1, 1, 1, 0],
   [1, 0, 0, 1, 0, 1, 1, 1, 1, 1],
   [1, 0, 1, 0, 1, 0, 0, 1, 0, 1],
   [0, 1, 0, 0, 1, 0, 1, 0, 0, 0],
   [1, 1, 1, 1, 1, 1, 0, 1, 0, 1],
   [0, 0, 1, 0, 0, 1, 1, 1, 0, 0],
   [0, 0, 0, 0, 1, 1, 0, 1, 1, 0],
   [0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
   [0, 1, 0, 1, 0, 1, 0, 0, 0, 0],
   [0, 0, 1, 0, 0, 0, 1, 1, 1, 0]])
>>> get_blocks(a)
[array([4, 4], dtype=int32), 
array([1, 1, 5], dtype=int32), 
array([1, 1, 1, 1, 1], dtype=int32), 
array([1, 1, 1], dtype=int32), 
array([6, 1, 1], dtype=int32), 
array([1, 3], dtype=int32), 
array([2, 2], dtype=int32), 
array([2], dtype=int32), 
array([1, 1, 1], dtype=int32), 
array([1, 3], dtype=int32)]

It works for columns as well if you call it on A.T

2 Comments

np.diff() takes a perpend and an append argument?! Man I should have a look a the documentation more often.
@sclaronomic Yes, it appears to be a new feature since Jan of 2019 when numpy 1.16.0 was released.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.