1

I have a 2D array and I'd like to transform it into a 3D array where each row of the new array contains multiple rows of the original 2D array.

This code replicates the functionality (each row of the output array contains 3 rows of the input array) but I'm just wondering what the correct way to do this is and I think a more correct way of indexing would be faster for large datasets.

input = np.arange(100) + np.arange(100)[:,None]
output = np.apply_along_axis(lambda x: input[x[0]:x[0]+3], 1, np.arange(100-2)[:,None])

The input array looks like this:

array([[  0,   1,   2, ...,  97,  98,  99],
       [  1,   2,   3, ...,  98,  99, 100],
       [  2,   3,   4, ...,  99, 100, 101],
       ...,
       [ 97,  98,  99, ..., 194, 195, 196],
       [ 98,  99, 100, ..., 195, 196, 197],
       [ 99, 100, 101, ..., 196, 197, 198]])

And the output array looks like this:

array([[[  0,   1,   2, ...,  97,  98,  99],
        [  1,   2,   3, ...,  98,  99, 100],
        [  2,   3,   4, ...,  99, 100, 101]],

       [[  1,   2,   3, ...,  98,  99, 100],
        [  2,   3,   4, ...,  99, 100, 101],
        [  3,   4,   5, ..., 100, 101, 102]],

       [[  2,   3,   4, ...,  99, 100, 101],
        [  3,   4,   5, ..., 100, 101, 102],
        [  4,   5,   6, ..., 101, 102, 103]],

       ...,

       [[ 95,  96,  97, ..., 192, 193, 194],
        [ 96,  97,  98, ..., 193, 194, 195],
        [ 97,  98,  99, ..., 194, 195, 196]],

       [[ 96,  97,  98, ..., 193, 194, 195],
        [ 97,  98,  99, ..., 194, 195, 196],
        [ 98,  99, 100, ..., 195, 196, 197]],

       [[ 97,  98,  99, ..., 194, 195, 196],
        [ 98,  99, 100, ..., 195, 196, 197],
        [ 99, 100, 101, ..., 196, 197, 198]]])
2
  • input is np.arange(100) + np.arange(100)[:, None], right? And output shape is (98, 3, 100)? Commented Jun 28, 2021 at 20:02
  • Wow, that is way cleaner (I'm rusty on my numpy so I updated the question). But yes, you're correct. Commented Jun 28, 2021 at 20:05

1 Answer 1

3

To start with, you can initialize input identically to what you have with just simple broadcasting:

ainput = np.arange(100) + np.arange(100)[:, None]

You should never have to transpose a symmetric array, or name variables that shadow built-in functions.

You can get the output without loops (which is what np.apply_along_axis does under the hood) using np.lib.stride_tricks.as_strided:

n = 3
aoutput = np.lib.stride_tricks.as_strided(ainput, shape=(ainput.shape[0] - n + 1, n, ainput.shape[1]), strides=(ainput.strides[0], *ainput.strides))

This basically says to view ainput as an array of the required shape, with the new dimension having a stride of one row in the original array. That means that the memory of the different layers overlap, and if you decide to write to this array, you may see the change in three places at once.

As of numpy version 1.20, there is a thin wrapper that does the same thing: np.lib.stride_tricks.sliding_window_view. It allows you to work in terms of the window size and axes, without having to compute the shape and strides manually:

aoutput = np.lib.stride_tricks.sliding_window_view(ainput, (3, 100), axis=(0, 1))
Sign up to request clarification or add additional context in comments.

4 Comments

So I'm not looking to jump rows. aoutput has rows 3-5 of ainput on its second row but I want rows 1-4 on the second row. I'm not exactly sure what the shape passes as strides above is doing.
@SuperCodeBrah. Updated. Strides is the number of bytes between elements in a given dimension
Beautiful. Points awarded
@SuperCodeBrah. I do like points. I updated with an easier-to-use alternative

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.