4
>>> idx = np.random.randint(2, size=(9, 31))
>>> a = np.random.random((9, 31, 2))
>>> a[idx].shape
(9, 31, 31, 2)

Why is the above not resulting in at least a shape of (9, 31, 1), or even better (9, 31)? How can I get it to return a selection based on the values in idx?

Update

This is perhaps a more concrete and hopefully analogue example: Assume this array

a = np.asarray([[1, 2], [3, 4], [5, 6], [7, 8]])

How would I go about selection the array [1, 4, 5, 8] (i.e. the 0th, 1st, 0th, 1st element of each row)?

2 Answers 2

4

I think this is what you want:

>>> a[np.arange(9)[:, None], np.arange(31), idx].shape
(9, 31)

For your second example you would do:

>>> a[np.arange(4), [0, 1, 0, 1]]
array([1, 4, 5, 8])

Read the docs on fancy indexing, especially the part on what happens when you don't have an index array for each dimension here: those extra np.arange arrays are placed there to avoid that behavior.

Note also how they are reshaped (indexing with [:, None] is equivalent to .reshape(-1, 1)) so that their broadcast shape has the shape of the desired output array.

Sign up to request clarification or add additional context in comments.

5 Comments

So in other words, the np.arange() is making sure each row is selected. But why is a[:, [0, 1, 0, 1]] not the same (applying Python indexing logic (which may be wrong for numpy))? I'd really like to avoid creating a new array (np.arange) for the selection process.
If you applied Python indexing logic to a[:, [0, 1, 0, 1]], it would raise a TypeError: indices must be integers, not tuple... ;-) What a[:, [0, 1, 0, 1]] means in numpy-speak is "for every row of a, create a four item array, that holds the first, second, first again and second again elements of that row." So there is really no alternative to creating the np.arange arrays. It is still ridiculously fast, so you really needn't worry about it.
Indeed, it seems very different to Python indexing. Thanks for the numpy translation. What does a[np.arange(4), [0, 1, 0, 1]] translate to? Intuitively it is still the same as [:,] to me...
Because np.arange(4) and [0, 1, 0, 1] have the same shape, and there is one array per dimension, the return will be an array of that same shape, (4,), where each array is used to index the dimension it occupies, i.e. you would get an array with [a[0, 0], a[1, 1], a[2, 0], a[3, 1]]. It is tricky and even unintuitive to get the hang of it, but once you do it is very, very powerful.
Thanks heaps for your insightful explanations. I'm slowly getting there... ;-)
-1

You're doing advanced indexing on the ndarray http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing.

Advanced indexes always are broadcast and iterated as one:

This is triggered because in your case the number of elements in the ndarray-index is not equal to the number of dimensions in the ndarray you are indexing into. Effectively you're producing an outer-product of slices: each element in your index produces a slice of the indexed array and not an element.

UPDATE:

>>> map(lambda idx: a[idx[0],idx[1]], [[0,0], [1,1], [2,0], [3,1]])

This will return:

[1, 4, 5, 8]

2 Comments

Hmm... that's not what I wanted. I'd much rather have it return a view on the array. But how can I use an array to determine the indices being selected?
I'm more interested in a numpy way to do the selection. Creating another list is not an option as the array is quite large.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.