3

I have a numpy array which I wish to split across a certain dimension. While splitting the array, I need to prepend (to the beginning of each element) a trailing part of the previous element. For instance,

Let my array be [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]. Let my split_size = 2 and pad_length = 1. split_size will always be a divisor of array length. My resultant splits would look like,

[random, 0, 1], [1, 2, 3], [3, 4, 5], [5, 6, 7], [7, 8, 9]. My splits were all prepended by the last value of the previous element.

Needless to say, my arrays are multidimensional and I need an efficent vectorized way to do this along a certain dimension.

Here, I can provide the value of random.

4
  • Won't we need padding on the trailing side too, like for the given input with : split_size = 5, pad_length = 2? So , I am guessing the last row would be : [7 8 9 random random]. Commented Dec 1, 2016 at 9:42
  • Why? for those parameters, I should get this --> [random, random, 0, 1, 2, 3, 4], [3, 4, 5, 6, 7, 8, 9]. If the question is not clear, I'll be happy to improve it as you direct! Commented Dec 1, 2016 at 9:46
  • Ah I got the params wrong. I meant if split_size = 3, pad_length = 2? Commented Dec 1, 2016 at 9:47
  • Oh, in this case split_size is always a divisor of array length Commented Dec 1, 2016 at 9:48

3 Answers 3

2

Sounds like a job for as_strided.

as_strided returns a memory efficient view on an array and can be used for retrieving a moving window over an array. The numpy documentation on it is scarce, but there's a number of decent blog posts, online slide decks, and SO issues that you can find that explain it in more detail.

>>> import numpy as np
>>> from numpy.lib.stride_tricks import as_strided
>>> a = np.arange(10)
>>> split_size = 2
>>> pad_length = 1
>>> random = -9
>>> # prepend the desired constant value
>>> b = np.pad(a, (pad_length, 0), mode='constant', constant_values=random)
>>> # return a memory efficient view on the array
>>> as_strided(b,
...     shape=(b.size//split_size, split_size + pad_length),
...     strides=(b.strides[0]*split_size, b.strides[0]))
...
array([[-9,  0,  1],
       [ 1,  2,  3],
       [ 3,  4,  5],
       [ 5,  6,  7],
       [ 7,  8,  9]])

Be aware that if the new strides go out of bounds, you'll see the memory contents of adjacent memory appearing at the end of the array.

Sign up to request clarification or add additional context in comments.

2 Comments

I think you meant split_size + pad_length. Great answer! :D
Works like a charm. Thank you so much!
1

Listed here is another approach with strides and could be looked at as a cheat stuff, as we would stride backwards from the beginning of the input array beyond the memory allocated for it to have a padded version implicitly and actually assigning values into the to-be-padded region at the end.

Here's how it would look like -

def padded_sliding_windows(a, split_size, pad_length, padnum):
    n = a.strides[0]
    L = split_size + pad_length
    S = L - pad_length
    nrows = ((a.size + pad_length -L)//split_size)+1
    strided = np.lib.stride_tricks.as_strided
    out = strided(a[split_size - 1:], shape=(nrows,L), strides=(S*n,-n))[:,::-1]
    out[0,:pad_length] = padnum
    return out

Few sample runs -

In [271]: a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])

In [272]: padded_sliding_windows(a, split_size = 2, pad_length = 1, padnum = 100)
Out[272]: 
array([[100,   0,   1],
       [  1,   2,   3],
       [  3,   4,   5],
       [  5,   6,   7],
       [  7,   8,   9],
       [  9,  10,  11]])

In [273]: padded_sliding_windows(a, split_size = 3, pad_length = 2, padnum = 100)
Out[273]: 
array([[100, 100,   0,   1,   2],
       [  1,   2,   3,   4,   5],
       [  4,   5,   6,   7,   8],
       [  7,   8,   9,  10,  11]])

In [274]: padded_sliding_windows(a, split_size = 4, pad_length = 2, padnum = 100)
Out[274]: 
array([[100, 100,   0,   1,   2,   3],
       [  2,   3,   4,   5,   6,   7],
       [  6,   7,   8,   9,  10,  11]])

Comments

0

The following comes close:

arr = np.array([0,1,2,3,4,5,6,7,8,9])
[arr[max(0, idx-1):idx+2] for idx in range(0, len(arr), 2)]

Only difference is that the first one does not have a leading random, as you put it.

2 Comments

Would this be efficient for larger arrays?
Probably it wouldn't be too bad, considering it's just slicing, which produces views of the data, rather than copies. Just make sure that you add :s for the additional dimensions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.