7

Is there an easy way to index a numpy multidimensional array along the last dimension, using an array of indices? For example, take an array a of shape (10, 10, 20). Let's assume I have an array of indices b, of shape (10, 10) so that the result would be c[i, j] = a[i, j, b[i, j]].

I've tried the following example:

a = np.ones((10, 10, 20))
b = np.tile(np.arange(10) + 10, (10, 1))
c = a[b]

However, this doesn't work because it then tries to index like a[b[i, j], b[i, j]], which is not the same as a[i, j, b[i, j]]. And so on. Is there an easy way to do this without resorting to a loop?

4
  • Just to make sure I understand properly, you want c[i, j] = a[i, j, b[i, j]] Commented Dec 3, 2014 at 17:00
  • @mgilson yes. Should've made that clearer. Commented Dec 3, 2014 at 18:19
  • This is normally done like c = a[np.arange(b.shape[0]), np.arange(b.shape[1]), b] but I'm hoping there's a better way. Commented Dec 3, 2014 at 20:31
  • @moarningsun That fails my correctness test. I think you must convert one of the two aranges to a column vector or so. Commented Dec 3, 2014 at 21:22

1 Answer 1

6

There are several ways to do this. Let's first generate some test data:

In [1]: a = np.random.rand(10, 10, 20)

In [2]: b = np.random.randint(20, size=(10,10))  # random integers in range 0..19

One way to solve the question would be to create two index vectors, where one is a row vector and the other a column vector of 0..9 using meshgrid:

In [3]: i1, i0 = np.meshgrid(range(10), range(10), sparse=True)

In [4]: c = a[i0, i1, b]

This works because i0, i1 and b will all be broadcasted to 10x10 matrices. Quick test for correctness:

In [5]: all(c[i, j] == a[i, j, b[i, j]] for i in range(10) for j in range(10))
Out[5]: True

Another way would be to use choose and rollaxis:

# choose needs a sequence of length 20, so move last axis to front
In [22]: aa = np.rollaxis(a, -1)  

In [23]: c = np.choose(b, aa)

In [24]: all(c[i, j] == a[i, j, b[i, j]] for i in range(10) for j in range(10))
Out[24]: True
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you. I like your first option better. The choose option looks nice, but it is not general enough as choose does not work for indices larger than 32 (a current bug in numpy, see discussion on increasing NPY_MAXARGS). According to my tests, it is also twice as slow as the meshgrid option.
In that case, you should probably look into functions like mgrid and ogrid, which are referenced from the meshgrid documentation. But do you work with matrices with more than 32 dimensions? I feel sorry for you, my head already explodes when trying to understand an array with more than 2 dimensions :).
Ah, I see your error now when going from 10x10x20 to 10x10x40. I guess choose unpacks the 3D matrix into a list of 2D matrices along the first dimension.
@BasSwinckels I tried this with unequal row and column sizes and it didn't work (IndexError: shape mismatch: indexing arrays could not be broadcast together). I'm not familiar with meshgrid so could you update it?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.