Numpy - index groups of rows into higher dimensional array

Question

I have a 2D array and I'd like to transform it into a 3D array where each row of the new array contains multiple rows of the original 2D array.

This code replicates the functionality (each row of the output array contains 3 rows of the input array) but I'm just wondering what the correct way to do this is and I think a more correct way of indexing would be faster for large datasets.

input = np.arange(100) + np.arange(100)[:,None]
output = np.apply_along_axis(lambda x: input[x[0]:x[0]+3], 1, np.arange(100-2)[:,None])

The input array looks like this:

array([[  0,   1,   2, ...,  97,  98,  99],
       [  1,   2,   3, ...,  98,  99, 100],
       [  2,   3,   4, ...,  99, 100, 101],
       ...,
       [ 97,  98,  99, ..., 194, 195, 196],
       [ 98,  99, 100, ..., 195, 196, 197],
       [ 99, 100, 101, ..., 196, 197, 198]])

And the output array looks like this:

array([[[  0,   1,   2, ...,  97,  98,  99],
        [  1,   2,   3, ...,  98,  99, 100],
        [  2,   3,   4, ...,  99, 100, 101]],

       [[  1,   2,   3, ...,  98,  99, 100],
        [  2,   3,   4, ...,  99, 100, 101],
        [  3,   4,   5, ..., 100, 101, 102]],

       [[  2,   3,   4, ...,  99, 100, 101],
        [  3,   4,   5, ..., 100, 101, 102],
        [  4,   5,   6, ..., 101, 102, 103]],

       ...,

       [[ 95,  96,  97, ..., 192, 193, 194],
        [ 96,  97,  98, ..., 193, 194, 195],
        [ 97,  98,  99, ..., 194, 195, 196]],

       [[ 96,  97,  98, ..., 193, 194, 195],
        [ 97,  98,  99, ..., 194, 195, 196],
        [ 98,  99, 100, ..., 195, 196, 197]],

       [[ 97,  98,  99, ..., 194, 195, 196],
        [ 98,  99, 100, ..., 195, 196, 197],
        [ 99, 100, 101, ..., 196, 197, 198]]])

input is np.arange(100) + np.arange(100)[:, None], right? And output shape is (98, 3, 100)? — Mad Physicist
– Mad Physicist, Commented Jun 28, 2021 at 20:02
Wow, that is way cleaner (I'm rusty on my numpy so I updated the question). But yes, you're correct. — SuperCodeBrah
– SuperCodeBrah, Commented Jun 28, 2021 at 20:05

Mad Physicist · Accepted Answer · 2021-06-28 20:29:35Z

3

To start with, you can initialize input identically to what you have with just simple broadcasting:

ainput = np.arange(100) + np.arange(100)[:, None]

You should never have to transpose a symmetric array, or name variables that shadow built-in functions.

You can get the output without loops (which is what np.apply_along_axis does under the hood) using np.lib.stride_tricks.as_strided:

n = 3
aoutput = np.lib.stride_tricks.as_strided(ainput, shape=(ainput.shape[0] - n + 1, n, ainput.shape[1]), strides=(ainput.strides[0], *ainput.strides))

This basically says to view ainput as an array of the required shape, with the new dimension having a stride of one row in the original array. That means that the memory of the different layers overlap, and if you decide to write to this array, you may see the change in three places at once.

As of numpy version 1.20, there is a thin wrapper that does the same thing: np.lib.stride_tricks.sliding_window_view. It allows you to work in terms of the window size and axes, without having to compute the shape and strides manually:

aoutput = np.lib.stride_tricks.sliding_window_view(ainput, (3, 100), axis=(0, 1))

edited Jun 28, 2021 at 20:29

answered Jun 28, 2021 at 20:07

Mad Physicist

116k29 gold badges202 silver badges292 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

SuperCodeBrah Over a year ago

So I'm not looking to jump rows. aoutput has rows 3-5 of ainput on its second row but I want rows 1-4 on the second row. I'm not exactly sure what the shape passes as strides above is doing.

Mad Physicist Over a year ago

@SuperCodeBrah. Updated. Strides is the number of bytes between elements in a given dimension

SuperCodeBrah Over a year ago

Beautiful. Points awarded

Mad Physicist Over a year ago

@SuperCodeBrah. I do like points. I updated with an easier-to-use alternative

Collectives™ on Stack Overflow

Numpy - index groups of rows into higher dimensional array

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related