9

I have a 2D python array that I want to slice in an odd way - I want a constant width slice starting on a different position on every row. I would like to do this in a vectorised way if possible.

e.g. I have the array A=np.array([range(5), range(5)]) which looks like

array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]])

I would like to slice this as follows: 2 elements from each row, starting at positions 0 and 3. The starting posiitons are stored in b=np.array([0,3]). Desired output is thus: np.array([[0,1],[3,4]]) i.e.

array([[0, 1],
       [3, 4]])

The obvious thing I tried to get this result was A[:,b:b+2] but that doesn't work, and I can't find anything that will.

Speed is important as this will operate on a largish array in a loop, and I don't want to bottleneck other parts of my code.

2
  • There's something in numpy.lib.stride_tricks... not to mention a dupe somewhere... Commented Sep 7, 2017 at 8:09
  • Please provide a Minimal, Complete, and Verifiable example to make it easier for us to answer your question without having to do a lot of extra work ourselves :) Commented Sep 7, 2017 at 8:10

3 Answers 3

4

You can use np.take():

In [21]: slices = np.dstack([b, b+1])

In [22]: np.take(arr, slices)
Out[22]: 
array([[[0, 1],
        [3, 4]]])
Sign up to request clarification or add additional context in comments.

8 Comments

Would this be slow for larger slices and arrays? I'm interested in slicing about 200 elements per row from a matrix of size approximately 2000x4000
@Shakespeare Size 2000x4000 is not that large. But still there might be some ways to enhance the performance, like using broadcasting and direct slicing instead of using take.
Ok I will look into it, it's important for it to run quickly since it feeds data to another part of my program in a loop
I think you need an axis = 1 keyword if the rows of A are not equal
And you'll end up with an extra dimension you'll have to deal with in that case.
|
3

Approach #1 : Here's one approach with broadcasting to get all indices and then using advanced-indexing to extract those -

def take_per_row(A, indx, num_elem=2):
    all_indx = indx[:,None] + np.arange(num_elem)
    return A[np.arange(all_indx.shape[0])[:,None], all_indx]

Sample run -

In [340]: A
Out[340]: 
array([[0, 5, 2, 6, 3, 7, 0, 0],
       [3, 2, 3, 1, 3, 1, 3, 7],
       [1, 7, 4, 0, 5, 1, 5, 4],
       [0, 8, 8, 6, 8, 6, 3, 1],
       [2, 5, 2, 5, 6, 7, 4, 3]])

In [341]: indx = np.array([0,3,1,5,2])

In [342]: take_per_row(A, indx)
Out[342]: 
array([[0, 5],
       [1, 3],
       [7, 4],
       [6, 3],
       [2, 5]])

Approach #2 : Using np.lib.stride_tricks.as_strided -

from numpy.lib.stride_tricks import as_strided

def take_per_row_strided(A, indx, num_elem=2):
    m,n = A.shape
    A.shape = (-1)
    s0 = A.strides[0]
    l_indx = indx + n*np.arange(len(indx))
    out = as_strided(A, (len(A)-num_elem+1, num_elem), (s0,s0))[l_indx]
    A.shape = m,n
    return out

Runtime test for taking 200 per row from a 2000x4000 matrix

In [447]: A = np.random.randint(0,9,(2000,4000))

In [448]: indx = np.random.randint(0,4000-200,(2000))

In [449]: out1 = take_per_row(A, indx, 200)

In [450]: out2 = take_per_row_strided(A, indx, 200)

In [451]: np.allclose(out1, out2)
Out[451]: True

In [452]: %timeit take_per_row(A, indx, 200)
100 loops, best of 3: 2.14 ms per loop

In [453]: %timeit take_per_row_strided(A, indx, 200)
1000 loops, best of 3: 435 µs per loop

17 Comments

Would I be correct in assuming this should be the fastest method?
That's usually a good assumption. It is @Divakar after all. That said, this is just my answer wrapped in a function and made a bit more general
I got 20 nanoseconds for taking 200 per row from a 2000x4000 matrix, so I'm gonna stick with this. Thanks
@Divakar 1.5ms for approach 2!
milliseconds, i.e. it's faster
|
1

You can set up a fancy indexing method to find the correct elements:

A = np.arange(10).reshape(2,-1)

x = np.stack([np.arange(A.shape[0])]* 2).T
y = np.stack([b, b+1]).T
A[x, y]

array([[0, 1],
       [8, 9]])

Compare to @Kasramvd's np.take answer:

slices = np.dstack([b, b+1])
np.take(A, slices)

array([[[0, 1],
        [3, 4]]])

np.slice by default takes from the flattened array, not row-wise. with an axis = 1 parameter you get all the slices of all the rows:

np.take(A, slices, axis = 1)

array([[[[0, 1],
         [3, 4]]],


       [[[5, 6],
         [8, 9]]]])

Which would need more processing.

1 Comment

Thanks for this answer, went with Divakar's as it's a bit more polished

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.