How to split numpy array keeping a few elements from previous split?

Question

I have a numpy array which I wish to split across a certain dimension. While splitting the array, I need to prepend (to the beginning of each element) a trailing part of the previous element. For instance,

Let my array be [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]. Let my split_size = 2 and pad_length = 1. split_size will always be a divisor of array length. My resultant splits would look like,

[random, 0, 1], [1, 2, 3], [3, 4, 5], [5, 6, 7], [7, 8, 9]. My splits were all prepended by the last value of the previous element.

Needless to say, my arrays are multidimensional and I need an efficent vectorized way to do this along a certain dimension.

Here, I can provide the value of random.

Won't we need padding on the trailing side too, like for the given input with : split_size = 5, pad_length = 2? So , I am guessing the last row would be : [7 8 9 random random]. — Divakar
– Divakar, Commented Dec 1, 2016 at 9:42
Why? for those parameters, I should get this --> [random, random, 0, 1, 2, 3, 4], [3, 4, 5, 6, 7, 8, 9]. If the question is not clear, I'll be happy to improve it as you direct! — martianwars
– martianwars, Commented Dec 1, 2016 at 9:46
Ah I got the params wrong. I meant if split_size = 3, pad_length = 2? — Divakar
– Divakar, Commented Dec 1, 2016 at 9:47
Oh, in this case split_size is always a divisor of array length — martianwars
– martianwars, Commented Dec 1, 2016 at 9:48

Oliver W. · Accepted Answer · 2016-12-01 09:24:58Z

2

Sounds like a job for as_strided.

as_strided returns a memory efficient view on an array and can be used for retrieving a moving window over an array. The numpy documentation on it is scarce, but there's a number of decent blog posts, online slide decks, and SO issues that you can find that explain it in more detail.

>>> import numpy as np
>>> from numpy.lib.stride_tricks import as_strided
>>> a = np.arange(10)
>>> split_size = 2
>>> pad_length = 1
>>> random = -9
>>> # prepend the desired constant value
>>> b = np.pad(a, (pad_length, 0), mode='constant', constant_values=random)
>>> # return a memory efficient view on the array
>>> as_strided(b,
...     shape=(b.size//split_size, split_size + pad_length),
...     strides=(b.strides[0]*split_size, b.strides[0]))
...
array([[-9,  0,  1],
       [ 1,  2,  3],
       [ 3,  4,  5],
       [ 5,  6,  7],
       [ 7,  8,  9]])

Be aware that if the new strides go out of bounds, you'll see the memory contents of adjacent memory appearing at the end of the array.

edited Dec 1, 2016 at 9:24

answered Dec 1, 2016 at 8:52

Oliver W.

13.6k3 gold badges41 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

martianwars Over a year ago

I think you meant split_size + pad_length. Great answer! :D

martianwars Over a year ago

Works like a charm. Thank you so much!

Divakar · Accepted Answer · 2016-12-01 10:51:33Z

Listed here is another approach with strides and could be looked at as a cheat stuff, as we would stride backwards from the beginning of the input array beyond the memory allocated for it to have a padded version implicitly and actually assigning values into the to-be-padded region at the end.

Here's how it would look like -

def padded_sliding_windows(a, split_size, pad_length, padnum):
    n = a.strides[0]
    L = split_size + pad_length
    S = L - pad_length
    nrows = ((a.size + pad_length -L)//split_size)+1
    strided = np.lib.stride_tricks.as_strided
    out = strided(a[split_size - 1:], shape=(nrows,L), strides=(S*n,-n))[:,::-1]
    out[0,:pad_length] = padnum
    return out

Few sample runs -

In [271]: a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])

In [272]: padded_sliding_windows(a, split_size = 2, pad_length = 1, padnum = 100)
Out[272]: 
array([[100,   0,   1],
       [  1,   2,   3],
       [  3,   4,   5],
       [  5,   6,   7],
       [  7,   8,   9],
       [  9,  10,  11]])

In [273]: padded_sliding_windows(a, split_size = 3, pad_length = 2, padnum = 100)
Out[273]: 
array([[100, 100,   0,   1,   2],
       [  1,   2,   3,   4,   5],
       [  4,   5,   6,   7,   8],
       [  7,   8,   9,  10,  11]])

In [274]: padded_sliding_windows(a, split_size = 4, pad_length = 2, padnum = 100)
Out[274]: 
array([[100, 100,   0,   1,   2,   3],
       [  2,   3,   4,   5,   6,   7],
       [  6,   7,   8,   9,  10,  11]])

acdr · Accepted Answer · 2016-12-01 08:50:20Z

0

The following comes close:

arr = np.array([0,1,2,3,4,5,6,7,8,9])
[arr[max(0, idx-1):idx+2] for idx in range(0, len(arr), 2)]

Only difference is that the first one does not have a leading random, as you put it.

answered Dec 1, 2016 at 8:50

acdr

4,7863 gold badges24 silver badges48 bronze badges

2 Comments

martianwars Over a year ago

Would this be efficient for larger arrays?

acdr Over a year ago

Probably it wouldn't be too bad, considering it's just slicing, which produces views of the data, rather than copies. Just make sure that you add :s for the additional dimensions.

Collectives™ on Stack Overflow

How to split numpy array keeping a few elements from previous split?

3 Answers 3

2 Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related