3

I have a matrix X of size (d,N). In other words, there are N vectors with d dimensions each. For example,

X = [[1,2,3,4],[5,6,7,8]]

there are N=4 vectors of d=2 dimensions.

Also, I have rag array (list of lists). Indices are indexing columns in the X matrix. For example,

I = [ [0,1], [1,2,3] ]

The I[0]=[0,1] indexes columns 0 and 1 in matrix X. Similarly the element I[1] indexes columns 1,2 and 3. Notice that elements of I are lists that are not of the same length!

What I would like to do, is to index the columns in the matrix X using each element in I, sum the vectors and get a vector. Repeat this for each element of I and thus build a new matrix Y. The matrix Y should have as many d-dimensional vectors as there are elements in I array. In my example, the Y matrix will have 2 vectors of 2 dimensions.

In my example, the element I[0] tells to get columns 0 and 1 from matrix X. Sum the two vectors 2-dimensional vectors of matrix X and put this vector in Y (column 0). Then, element I[1] tells to sum the columns 1,2 and 3 of matrix X and put this new vector in Y (column 1).

I can do this easily using a loop but I would like to vectorize this operation if possible. My matrix X has hundreds of thousands of columns and the I indexing matrix has tens of thousands elements (each element is a short lists of indices).

My loopy code :

Y = np.zeros( (d,len(I)) )
for i,idx in enumerate(I):
    Y[:,i] = np.sum( X[:,idx], axis=1 )
2
  • Share your loopy code if you have implemented? Commented Jan 23, 2017 at 7:28
  • @Divakar added my loopy code Commented Jan 23, 2017 at 7:34

1 Answer 1

4

Here's an approach -

# Get a flattened version of indices
idx0 = np.concatenate(I)

# Get indices at which we need to do "intervaled-summation" along axis=1
cut_idx = np.append(0,map(len,I))[:-1].cumsum()

# Finally index into cols of array with flattend indices & perform summation
out = np.add.reduceat(X[:,idx0], cut_idx,axis=1)

Step-by-step run -

In [67]: X
Out[67]: 
array([[ 1,  2,  3,  4],
       [15,  6, 17,  8]])

In [68]: I
Out[68]: array([[0, 2, 3, 1], [2, 3, 1], [2, 3]], dtype=object)

In [69]: idx0 = np.concatenate(I)

In [70]: idx0 # Flattened indices
Out[70]: array([0, 2, 3, 1, 2, 3, 1, 2, 3])

In [71]: cut_idx = np.append(0,map(len,I))[:-1].cumsum()

In [72]: cut_idx # We need to do addition in intervals limited by these indices
Out[72]: array([0, 4, 7])

In [74]: X[:,idx0]  # Select all of the indexed columns
Out[74]: 
array([[ 1,  3,  4,  2,  3,  4,  2,  3,  4],
       [15, 17,  8,  6, 17,  8,  6, 17,  8]])

In [75]: np.add.reduceat(X[:,idx0], cut_idx,axis=1)
Out[75]: 
array([[10,  9,  7],
       [46, 31, 25]])
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! Do you mind explaining shortly what each line does (before I go lookup the functions)?
@Divakar, if i wanted to return just the values in the last step instead of summing, what would be the function instead of np.add.reduceat. So with the OP's X and I examples i would want the output: [[1, 2], [6, 7, 8]]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.