Skip every nth index of numpy array

Question

In order to do K-fold validation I would like to use slice a numpy array such that a view of the original array is made but with every nth element removed.

For example:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

If n = 4 then the result would be

[1, 2, 4, 5, 6, 8, 9]

Note: the numpy requirement is due to this being used for a machine learning assignment where the dependencies are fixed.

For the use-case of cross-validation this approach looks scary. There are some hidden assumptions then about the order of the data. I would prefer some shuffle/random_permutation based approach in general, but would also stick to the functions available in scikit-learn as there is even more powerfull stuff like stratified sampling (if needed). Side-note: clean up your tags as fold (functional-programming) and k (programming-language) are just wrong. — sascha
– sascha, Commented Dec 2, 2016 at 10:17
I agree with sascha. In particular, take a look at the cross-validation iterators. scikit-learn.org/stable/modules/… — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Dec 2, 2016 at 10:19
@sascha I agree that using an existing library would be better however I should have mentioned that I can only use numpy as a dependency as this is for a machine learning assignment sorry! In order to achieve randomness I am shuffling the rows using np.random.shuffle. — Ben Hazelwood
– Ben Hazelwood, Commented Dec 2, 2016 at 14:47
I understand. But after shuffling it does not matter if you take every 4-th or the the first N/4 values. The latter might be easier to implement. — sascha
– sascha, Commented Dec 2, 2016 at 14:55

Divakar · Accepted Answer · 2016-12-02 11:03:47Z

13

Approach #1 with modulus

a[np.mod(np.arange(a.size),4)!=0]

Sample run -

In [255]: a
Out[255]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [256]: a[np.mod(np.arange(a.size),4)!=0]
Out[256]: array([1, 2, 3, 5, 6, 7, 9])

Approach #2 with masking : Requirement as a view

Considering the views requirement, if the idea is to save on memory, we could store the equivalent boolean array that would occupy 8 times less memory on Linux system. Thus, such a mask based approach would be like so -

# Create mask
mask = np.ones(a.size, dtype=bool)
mask[::4] = 0

Here's the memory requirement stat -

In [311]: mask.itemsize
Out[311]: 1

In [312]: a.itemsize
Out[312]: 8

Then, we could use boolean-indexing as a view -

In [313]: a
Out[313]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [314]: a[mask] = 10

In [315]: a
Out[315]: array([ 0, 10, 10, 10,  4, 10, 10, 10,  8, 10])

Approach #3 with NumPy array strides : Requirement as a view

You can use np.lib.stride_tricks.as_strided to create such a view given the length of the input array is a multiple of n. If it's not a multiple, it would still work, but won't be a safe practice, as we would be going beyond the memory allocated for input array. Please note that the view thus created would be 2D.

Thus, an implementaion to get such a view would be -

def skipped_view(a, n):
    s = a.strides[0]
    strided = np.lib.stride_tricks.as_strided
    return strided(a,shape=((a.size+n-1)//n,n),strides=(n*s,s))[:,1:]

Sample run -

In [50]: a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) # Input array

In [51]: a_out = skipped_view(a, 4)

In [52]: a_out
Out[52]: 
array([[ 1,  2,  3],
       [ 5,  6,  7],
       [ 9, 10, 11]])

In [53]: a_out[:] = 100 # Let's prove output is a view indeed

In [54]: a
Out[54]: array([  0, 100, 100, 100,   4, 100, 100, 100,   8, 100, 100, 100])

edited Dec 2, 2016 at 11:03

answered Dec 2, 2016 at 10:12

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ben Hazelwood Over a year ago

Great answer thank you @Divakar #2 looks like the best solution for me

Divakar Over a year ago

@BenHazelwood I would agree, that works as a generic solution.

Chr · Accepted Answer · 2016-12-02 10:13:32Z

2

numpy.delete :

In [18]: arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [19]: arr = np.delete(arr, np.arange(0, arr.size, 4))

In [20]: arr
Out[20]: array([1, 2, 3, 5, 6, 7, 9])

answered Dec 2, 2016 at 10:13

Chr

9651 gold badge10 silver badges27 bronze badges

2 Comments

sascha Over a year ago

That does not look like a view.

Ben Hazelwood Over a year ago

I agree with @sascha in that if a more memory efficient approach exists it would be better

L.Lauenburg · Accepted Answer · 2022-02-26 19:41:59Z

0

The slickest answer that I found is using delete with i being the nth index which you want to skip:

del list[i-1::i]

Example:

In [1]: a = list([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [2]: del a[4-1::4]
In [3]: print(a)
Out[3]: [0, 1, 2, 4, 5, 6, 8, 9]

If you also want to skip the first value, use a[1:].

answered Feb 26, 2022 at 19:41

L.Lauenburg

4721 gold badge5 silver badges19 bronze badges

Collectives™ on Stack Overflow

Skip every nth index of numpy array

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related