7

In order to do K-fold validation I would like to use slice a numpy array such that a view of the original array is made but with every nth element removed.

For example:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

If n = 4 then the result would be

[1, 2, 4, 5, 6, 8, 9]

Note: the numpy requirement is due to this being used for a machine learning assignment where the dependencies are fixed.

4
  • For the use-case of cross-validation this approach looks scary. There are some hidden assumptions then about the order of the data. I would prefer some shuffle/random_permutation based approach in general, but would also stick to the functions available in scikit-learn as there is even more powerfull stuff like stratified sampling (if needed). Side-note: clean up your tags as fold (functional-programming) and k (programming-language) are just wrong. Commented Dec 2, 2016 at 10:17
  • I agree with sascha. In particular, take a look at the cross-validation iterators. scikit-learn.org/stable/modules/… Commented Dec 2, 2016 at 10:19
  • @sascha I agree that using an existing library would be better however I should have mentioned that I can only use numpy as a dependency as this is for a machine learning assignment sorry! In order to achieve randomness I am shuffling the rows using np.random.shuffle. Commented Dec 2, 2016 at 14:47
  • I understand. But after shuffling it does not matter if you take every 4-th or the the first N/4 values. The latter might be easier to implement. Commented Dec 2, 2016 at 14:55

3 Answers 3

13

Approach #1 with modulus

a[np.mod(np.arange(a.size),4)!=0]

Sample run -

In [255]: a
Out[255]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [256]: a[np.mod(np.arange(a.size),4)!=0]
Out[256]: array([1, 2, 3, 5, 6, 7, 9])

Approach #2 with masking : Requirement as a view

Considering the views requirement, if the idea is to save on memory, we could store the equivalent boolean array that would occupy 8 times less memory on Linux system. Thus, such a mask based approach would be like so -

# Create mask
mask = np.ones(a.size, dtype=bool)
mask[::4] = 0

Here's the memory requirement stat -

In [311]: mask.itemsize
Out[311]: 1

In [312]: a.itemsize
Out[312]: 8

Then, we could use boolean-indexing as a view -

In [313]: a
Out[313]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [314]: a[mask] = 10

In [315]: a
Out[315]: array([ 0, 10, 10, 10,  4, 10, 10, 10,  8, 10])

Approach #3 with NumPy array strides : Requirement as a view

You can use np.lib.stride_tricks.as_strided to create such a view given the length of the input array is a multiple of n. If it's not a multiple, it would still work, but won't be a safe practice, as we would be going beyond the memory allocated for input array. Please note that the view thus created would be 2D.

Thus, an implementaion to get such a view would be -

def skipped_view(a, n):
    s = a.strides[0]
    strided = np.lib.stride_tricks.as_strided
    return strided(a,shape=((a.size+n-1)//n,n),strides=(n*s,s))[:,1:]

Sample run -

In [50]: a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) # Input array

In [51]: a_out = skipped_view(a, 4)

In [52]: a_out
Out[52]: 
array([[ 1,  2,  3],
       [ 5,  6,  7],
       [ 9, 10, 11]])

In [53]: a_out[:] = 100 # Let's prove output is a view indeed

In [54]: a
Out[54]: array([  0, 100, 100, 100,   4, 100, 100, 100,   8, 100, 100, 100])
Sign up to request clarification or add additional context in comments.

2 Comments

Great answer thank you @Divakar #2 looks like the best solution for me
@BenHazelwood I would agree, that works as a generic solution.
2

numpy.delete :

In [18]: arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [19]: arr = np.delete(arr, np.arange(0, arr.size, 4))

In [20]: arr
Out[20]: array([1, 2, 3, 5, 6, 7, 9])

2 Comments

That does not look like a view.
I agree with @sascha in that if a more memory efficient approach exists it would be better
0

The slickest answer that I found is using delete with i being the nth index which you want to skip:

del list[i-1::i]

Example:

In [1]: a = list([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [2]: del a[4-1::4]
In [3]: print(a)
Out[3]: [0, 1, 2, 4, 5, 6, 8, 9]

If you also want to skip the first value, use a[1:].

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.