1

This is an indirect indexing problem.

It can be solved with a list comprehension.

The question is whether, or, how to solve it within numpy,

When data.shape is (T,N) and c.shape is (T,K)

and each element of c is an int between 0 and N-1 inclusive, that is, each element of c is intended to refer to a column number from data.

The goal is to obtain out where

out.shape = (T,K)

And for each i in 0..(T-1)

the row out[i] = [ data[i, c[i,0]] , ... , data[i, c[i,K-1]] ]

Concrete example:

data = np.array([\
       [ 0,  1,  2],\
       [ 3,  4,  5],\
       [ 6,  7,  8],\
       [ 9, 10, 11],\
       [12, 13, 14]])

c = np.array([
      [0, 2],\
      [1, 2],\
      [0, 0],\       
      [1, 1],\       
      [2, 2]])

out should be out = [[0, 2], [4, 5], [6, 6], [10, 10], [14, 14]]

The first row of out is [0,2] because the columns chosen are given by c's row 0, they are 0 and 2, and data[0] at columns 0 and 2 are 0 and 2.

The second row of out is [4,5] because the columns chosen are given by c's row 1, they are 1 and 2, and data[1] at columns 1 and 2 is 4 and 5.

Numpy fancy indexing doesn't seem to solve this in an obvious way because indexing data with c (e.g. data[c], np.take(data,c,axis=1) ) always produces a 3 dimensional array.

A list comprehension can solve it:

out = [ [data[rowidx,i1],data[rowidx,i2]] for (rowidx, (i1,i2)) in enumerate(c) ]

if K is 2 I suppose this is marginally OK. If K is variable, this is not so good.

The list comprehension has to be rewritten for each value K, because it unrolls the columns picked out of data by each row of c. It also violates DRY.

Is there a solution based entirely in numpy?

2 Answers 2

2

You can avoid loops with np.choose:

In [1]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.

data = np.array([\
       [ 0,  1,  2],\
       [ 3,  4,  5],\
       [ 6,  7,  8],\
       [ 9, 10, 11],\
       [12, 13, 14]])

c = np.array([
      [0, 2],\
      [1, 2],\
      [0, 0],\
      [1, 1],\
      [2, 2]])
--

In [2]: np.choose(c, data.T[:,:,np.newaxis])
Out[2]: 
array([[ 0,  2],
       [ 4,  5],
       [ 6,  6],
       [10, 10],
       [14, 14]])
Sign up to request clarification or add additional context in comments.

2 Comments

Nice! I didn't think to use choose.
Yup, it takes a while to wrap your head around its possible uses.
1

Here's one possible route to a general solution...

Create masks for data to select the values for each column of out. For example, the first mask could be achieved by writing:

>>> np.arange(3) == np.vstack(c[:,0])
array([[ True, False, False],
       [False,  True, False],
       [ True, False, False],
       [False,  True, False],
       [False, False,  True]], dtype=bool)

>>> data[_]
array([ 2,  5,  6, 10, 14])

The mask to get the values for the second column of out: np.arange(3) == np.vstack(c[:,1]).

So, to get the out array...

>>> mask0 = np.arange(3) == np.vstack(c[:,0])
>>> mask1 = np.arange(3) == np.vstack(c[:,1])
>>> np.vstack((data[mask0], data[mask1])).T
array([[ 0,  2],
       [ 4,  5],
       [ 6,  6],
       [10, 10],
       [14, 14]])

Edit: Given arbitrary array widths K and N you could use a loop to create the masks, so the general construction of the out array might simply look like this:

np.vstack([data[np.arange(N) == np.vstack(c[:,i])] for i in range(K)]).T

Edit 2: A slightly neater solution (though still relying on a loop) is:

np.vstack([data[i][c[i]] for i in range(T)])

2 Comments

This is interesting, and I'll have to look up vstack and see what it does.... But it also unfortunately seems to depend on K. K might not always be 2.
I see... I've edited my answer to adapt to the more general case where K might be large. I'll see if I can think of any other way to avoid loops completely...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.