Vectorize numpy indexing and apply a function to build a matrix

Question

I have a matrix X of size (d,N). In other words, there are N vectors with d dimensions each. For example,

X = [[1,2,3,4],[5,6,7,8]]

there are N=4 vectors of d=2 dimensions.

Also, I have rag array (list of lists). Indices are indexing columns in the X matrix. For example,

I = [ [0,1], [1,2,3] ]

The I[0]=[0,1] indexes columns 0 and 1 in matrix X. Similarly the element I[1] indexes columns 1,2 and 3. Notice that elements of I are lists that are not of the same length!

What I would like to do, is to index the columns in the matrix X using each element in I, sum the vectors and get a vector. Repeat this for each element of I and thus build a new matrix Y. The matrix Y should have as many d-dimensional vectors as there are elements in I array. In my example, the Y matrix will have 2 vectors of 2 dimensions.

In my example, the element I[0] tells to get columns 0 and 1 from matrix X. Sum the two vectors 2-dimensional vectors of matrix X and put this vector in Y (column 0). Then, element I[1] tells to sum the columns 1,2 and 3 of matrix X and put this new vector in Y (column 1).

I can do this easily using a loop but I would like to vectorize this operation if possible. My matrix X has hundreds of thousands of columns and the I indexing matrix has tens of thousands elements (each element is a short lists of indices).

My loopy code :

Y = np.zeros( (d,len(I)) )
for i,idx in enumerate(I):
    Y[:,i] = np.sum( X[:,idx], axis=1 )

Share your loopy code if you have implemented?

Divakar
– Divakar

2017-01-23 07:28:58 +00:00
Commented Jan 23, 2017 at 7:28 — Divakar
– Divakar, Commented Jan 23, 2017 at 7:28
@Divakar added my loopy code

Vladislavs Dovgalecs
– Vladislavs Dovgalecs

2017-01-23 07:34:43 +00:00
Commented Jan 23, 2017 at 7:34 — Vladislavs Dovgalecs
– Vladislavs Dovgalecs, Commented Jan 23, 2017 at 7:34

Divakar · Accepted Answer · 2017-01-23 07:46:29Z

4

Here's an approach -

# Get a flattened version of indices
idx0 = np.concatenate(I)

# Get indices at which we need to do "intervaled-summation" along axis=1
cut_idx = np.append(0,map(len,I))[:-1].cumsum()

# Finally index into cols of array with flattend indices & perform summation
out = np.add.reduceat(X[:,idx0], cut_idx,axis=1)

Step-by-step run -

In [67]: X
Out[67]: 
array([[ 1,  2,  3,  4],
       [15,  6, 17,  8]])

In [68]: I
Out[68]: array([[0, 2, 3, 1], [2, 3, 1], [2, 3]], dtype=object)

In [69]: idx0 = np.concatenate(I)

In [70]: idx0 # Flattened indices
Out[70]: array([0, 2, 3, 1, 2, 3, 1, 2, 3])

In [71]: cut_idx = np.append(0,map(len,I))[:-1].cumsum()

In [72]: cut_idx # We need to do addition in intervals limited by these indices
Out[72]: array([0, 4, 7])

In [74]: X[:,idx0]  # Select all of the indexed columns
Out[74]: 
array([[ 1,  3,  4,  2,  3,  4,  2,  3,  4],
       [15, 17,  8,  6, 17,  8,  6, 17,  8]])

In [75]: np.add.reduceat(X[:,idx0], cut_idx,axis=1)
Out[75]: 
array([[10,  9,  7],
       [46, 31, 25]])

edited Jan 23, 2017 at 7:46

answered Jan 23, 2017 at 7:43

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Vladislavs Dovgalecs Over a year ago

Thanks! Do you mind explaining shortly what each line does (before I go lookup the functions)?

Hamza Zubair Over a year ago

@Divakar, if i wanted to return just the values in the last step instead of summing, what would be the function instead of np.add.reduceat. So with the OP's X and I examples i would want the output: [[1, 2], [6, 7, 8]]

Collectives™ on Stack Overflow

Vectorize numpy indexing and apply a function to build a matrix

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related