pick TxK numpy array from TxN numpy array using TxK column index array

Question

This is an indirect indexing problem.

It can be solved with a list comprehension.

The question is whether, or, how to solve it within numpy,

When data.shape is (T,N) and c.shape is (T,K)

and each element of c is an int between 0 and N-1 inclusive, that is, each element of c is intended to refer to a column number from data.

The goal is to obtain out where

out.shape = (T,K)

And for each i in 0..(T-1)

the row out[i] = [ data[i, c[i,0]] , ... , data[i, c[i,K-1]] ]

Concrete example:

data = np.array([\
       [ 0,  1,  2],\
       [ 3,  4,  5],\
       [ 6,  7,  8],\
       [ 9, 10, 11],\
       [12, 13, 14]])

c = np.array([
      [0, 2],\
      [1, 2],\
      [0, 0],\       
      [1, 1],\       
      [2, 2]])

out should be out = [[0, 2], [4, 5], [6, 6], [10, 10], [14, 14]]

The first row of out is [0,2] because the columns chosen are given by c's row 0, they are 0 and 2, and data[0] at columns 0 and 2 are 0 and 2.

The second row of out is [4,5] because the columns chosen are given by c's row 1, they are 1 and 2, and data[1] at columns 1 and 2 is 4 and 5.

Numpy fancy indexing doesn't seem to solve this in an obvious way because indexing data with c (e.g. data[c], np.take(data,c,axis=1) ) always produces a 3 dimensional array.

A list comprehension can solve it:

out = [ [data[rowidx,i1],data[rowidx,i2]] for (rowidx, (i1,i2)) in enumerate(c) ]

if K is 2 I suppose this is marginally OK. If K is variable, this is not so good.

The list comprehension has to be rewritten for each value K, because it unrolls the columns picked out of data by each row of c. It also violates DRY.

Is there a solution based entirely in numpy?

Paul · Accepted Answer · 2014-10-07 00:35:58Z

2

You can avoid loops with np.choose:

In [1]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.

data = np.array([\
       [ 0,  1,  2],\
       [ 3,  4,  5],\
       [ 6,  7,  8],\
       [ 9, 10, 11],\
       [12, 13, 14]])

c = np.array([
      [0, 2],\
      [1, 2],\
      [0, 0],\
      [1, 1],\
      [2, 2]])
--

In [2]: np.choose(c, data.T[:,:,np.newaxis])
Out[2]: 
array([[ 0,  2],
       [ 4,  5],
       [ 6,  6],
       [10, 10],
       [14, 14]])

edited Oct 7, 2014 at 0:35

Paul

27.8k13 gold badges90 silver badges127 bronze badges

answered Oct 6, 2014 at 21:46

immerrr

1,2737 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Alex Riley Over a year ago

Nice! I didn't think to use choose.

immerrr Over a year ago

Yup, it takes a while to wrap your head around its possible uses.

Alex Riley · Accepted Answer · 2014-10-06 20:37:12Z

1

Here's one possible route to a general solution...

Create masks for data to select the values for each column of out. For example, the first mask could be achieved by writing:

>>> np.arange(3) == np.vstack(c[:,0])
array([[ True, False, False],
       [False,  True, False],
       [ True, False, False],
       [False,  True, False],
       [False, False,  True]], dtype=bool)

>>> data[_]
array([ 2,  5,  6, 10, 14])

The mask to get the values for the second column of out: np.arange(3) == np.vstack(c[:,1]).

So, to get the out array...

>>> mask0 = np.arange(3) == np.vstack(c[:,0])
>>> mask1 = np.arange(3) == np.vstack(c[:,1])
>>> np.vstack((data[mask0], data[mask1])).T
array([[ 0,  2],
       [ 4,  5],
       [ 6,  6],
       [10, 10],
       [14, 14]])

Edit: Given arbitrary array widths K and N you could use a loop to create the masks, so the general construction of the out array might simply look like this:

np.vstack([data[np.arange(N) == np.vstack(c[:,i])] for i in range(K)]).T

Edit 2: A slightly neater solution (though still relying on a loop) is:

np.vstack([data[i][c[i]] for i in range(T)])

edited Oct 6, 2014 at 20:37

answered Oct 6, 2014 at 19:27

Alex Riley

178k46 gold badges274 silver badges247 bronze badges

2 Comments

Paul Over a year ago

This is interesting, and I'll have to look up vstack and see what it does.... But it also unfortunately seems to depend on K. K might not always be 2.

Alex Riley Over a year ago

I see... I've edited my answer to adapt to the more general case where K might be large. I'll see if I can think of any other way to avoid loops completely...

Collectives™ on Stack Overflow

pick TxK numpy array from TxN numpy array using TxK column index array

2 Answers 2

2 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related