numpy: efficiently add rows of a matrix

Question

I have a matrix.

mat = array([
   [ 0,  1,  2,  3],
   [ 4,  5,  6,  7],
   [ 8,  9, 10, 11]
   ])

I'd like to get the sum of the rows at certain indices: eg.

ixs = np.array([0,2,0,0,0,1,1])

I know I can compute the answer as:

mat[ixs].sum(axis=0)
> array([16, 23, 30, 37])

The problem is ixs may be very long, and I don't want to use all the memory to create the intermediate product mat[ixs], only to reduce it again with the sum.

I also know I could simply count up the indices and use multiplication instead.

np.bincount(ixs, minlength=mat.shape[0).dot(mat)
> array([16, 23, 30, 37])

But that will be expensive if my ixs are sparse.

I know about scipy's sparse matrices, and I suppose I could use them, but I'd prefer a pure numpy solution as sparse matrices are limited in various ways (such as only being 2-d)

So, is there a pure numpy way to merge the indexing and sum-reduction in this case?

Conclusions:

Thanks you Divakar and hpaulj for your very thorough responses. By "sparse" I meant that most of the values in range(w.shape[0]) are not in ixs. Using that new definition (and with more realisitic data size, I re-ran Divakar tests, with some new funcitona dded :

rng = np.random.RandomState(1234)
mat = rng.randn(1000, 500)
ixs = rng.choice(rng.randint(mat.shape[0], size=mat.shape[0]/10), size=1000)

# Divakar's solutions
In[42]: %timeit org_indexing_app(mat, ixs)
1000 loops, best of 3: 1.82 ms per loop
In[43]: %timeit org_bincount_app(mat, ixs)
The slowest run took 4.07 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 177 µs per loop
In[44]: %timeit indexing_modified_app(mat, ixs)
1000 loops, best of 3: 1.81 ms per loop
In[45]: %timeit bincount_modified_app(mat, ixs)
1000 loops, best of 3: 258 µs per loop
In[46]: %timeit simply_indexing_app(mat, ixs)
1000 loops, best of 3: 1.86 ms per loop
In[47]: %timeit take_app(mat, ixs)
1000 loops, best of 3: 1.82 ms per loop
In[48]: %timeit unq_mask_einsum_app(mat, ixs)
10 loops, best of 3: 58.2 ms per loop 
# hpaulj's solutions
In[53]: %timeit hpauljs_sparse_solution(mat, ixs)
The slowest run took 9.34 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 524 µs per loop
%timeit hpauljs_second_sparse_solution(mat, ixs)
100 loops, best of 3: 9.91 ms per loop
# Sparse version of original bincount solution (see below):
In[60]: %timeit sparse_bincount(mat, ixs)
10000 loops, best of 3: 71.7 µs per loop

The winner in this case is the sparse version of the bincount solution.

def sparse_bincount(mat, ixs):
    x = np.bincount(ixs)
    nonzeros, = np.nonzero(x)
    x[nonzeros].dot(mat[nonzeros])

By sparsey ixs, do you mean lots of zeors in it or something else? — Divakar
– Divakar, Commented Oct 10, 2016 at 20:27
I mean most most of the values in range(w.shape[0]) are not in ixs — Peter
– Peter, Commented Oct 11, 2016 at 8:57
In other words np.bincount(ixs, minlength=mat.shape[0) can be sparse — Peter
– Peter, Commented Oct 11, 2016 at 9:06
Alright, for such a case take a look at : bincount_modified_app approach posted in the solution. — Divakar
– Divakar, Commented Oct 11, 2016 at 9:14

hpaulj · Accepted Answer · 2016-10-10 20:39:26Z

An alternative to bincount is add.at:

In [193]: mat
Out[193]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
In [194]: ixs
Out[194]: array([0, 2, 0, 0, 0, 1, 1])

In [195]: J = np.zeros(mat.shape[0],int)
In [196]: np.add.at(J, ixs, 1)
In [197]: J
Out[197]: array([4, 2, 1])

In [198]: np.dot(J, mat)
Out[198]: array([16, 23, 30, 37])

By the sparsity, you mean, I assume, that ixs might not include all the rows, for example, ixs without the 0s:

In [199]: ixs = np.array([2,1,1])
In [200]: J=np.zeros(mat.shape[0],int)
In [201]: np.add.at(J, ixs, 1)
In [202]: J
Out[202]: array([0, 2, 1])
In [203]: np.dot(J, mat)
Out[203]: array([16, 19, 22, 25])

J still has the mat.shape[0] shape. But the add.at should scale as the length of ixs.

A sparse solution would look something like:

Make a sparse matrix from ixs that looks like:

In [204]: I
Out[204]: 
array([[1, 0, 1, 1, 1, 0, 0],
       [0, 0, 0, 0, 0, 1, 1],
       [0, 1, 0, 0, 0, 0, 0]])

sum the rows; sparse does this with matrix multiplication like:

In [205]: np.dot(I, np.ones((7,),int))
Out[205]: array([4, 2, 1])

then do our dot:

In [206]: np.dot(np.dot(I, np.ones((7,),int)), mat)
Out[206]: array([16, 23, 30, 37])

Or in sparse code:

In [225]: J = sparse.coo_matrix((np.ones_like(ixs,int),(np.arange(ixs.shape[0]), ixs)))
In [226]: J.A
Out[226]: 
array([[1, 0, 0],
       [0, 0, 1],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [0, 1, 0],
       [0, 1, 0]])
In [227]: J.sum(axis=0)*mat
Out[227]: matrix([[16, 23, 30, 37]])

sparse, when converting from coo to csr sums duplicates. I can take advantage that with

In [229]: J = sparse.coo_matrix((np.ones_like(ixs,int), (np.zeros_like(ixs,int), ixs)))
In [230]: J
Out[230]: 
<1x3 sparse matrix of type '<class 'numpy.int32'>'
    with 7 stored elements in COOrdinate format>
In [231]: J.A
Out[231]: array([[4, 2, 1]])
In [232]: J*mat
Out[232]: array([[16, 23, 30, 37]], dtype=int32)

Divakar · Accepted Answer · 2016-10-10 20:45:36Z

Since we are assuming that ixs could be sparsey, we could modify the strategy to get the summations of rows from the zero-th row and rest of the rows separately based on the given row indices. So, we could use the bincount method for the non-zero-th indexed rows summation and add it with the (zero-th row x no. of zeros in ixs).

Thus, the second approach could be modified, like so -

nzmask = ixs!=0
nzsum = np.bincount(ixs[nzmask]-1, minlength=mat.shape[0]-1).dot(mat[1:])
row0_sum = mat[0]*(len(ixs) - np.count_nonzero(nzmask))
out = nzsum + row0_sum

We could extend this strategy to the first approach as well, like so -

out = mat[0]*(len(ixs) - len(nzidx)) + mat[ixs[nzidx]].sum(axis=0)

If we are working with lots of non-zero indices that are repeated, we could alternatively make use of np.take with focus on performance. Thus, mat[ixs[nzidx]] could be replaced by np.take(mat,ixs[nzidx],axis=0) and similarly mat[ixs] by np.take(mat,ixs,axis=0). With such repeated indices based indexing np.take brings out some noticeable speedup as compared to simply indexing.

Finally, we could use np.einsum to perform these row ID based selection and summing, like so -

nzmask = ixs!=0
unq,tags = np.unique(ixs[nzmask],return_inverse=1)
nzsum = np.einsum('ji,jk->k',np.arange(len(unq))[:,None] == tags,mat[unq])
out = mat[0]*(len(ixs) - np.count_nonzero(nzmask)) + nzsum

Benchmarking

Let's list out all the five approaches posted thus far in this post and also include the two approaches posted in the question for some runtime testing as functions -

def org_indexing_app(mat,ixs):
    return mat[ixs].sum(axis=0)

def org_bincount_app(mat,ixs):
    return np.bincount(ixs, minlength=mat.shape[0]).dot(mat)

def indexing_modified_app(mat,ixs):
    return np.take(mat,ixs,axis=0).sum(axis=0)

def bincount_modified_app(mat,ixs):
    nzmask = ixs!=0
    nzsum = np.bincount(ixs[nzmask]-1, minlength=mat.shape[0]-1).dot(mat[1:])
    row0_sum = mat[0]*(len(ixs) - np.count_nonzero(nzmask))
    return nzsum + row0_sum

def simply_indexing_app(mat,ixs):
    nzmask = ixs!=0
    nzsum = mat[ixs[nzmask]].sum(axis=0)
    return mat[0]*(len(ixs) - np.count_nonzero(nzmask)) + nzsum

def take_app(mat,ixs):
    nzmask = ixs!=0
    nzsum = np.take(mat,ixs[nzmask],axis=0).sum(axis=0)
    return mat[0]*(len(ixs) - np.count_nonzero(nzmask)) + nzsum

def unq_mask_einsum_app(mat,ixs):
    nzmask = ixs!=0
    unq,tags = np.unique(ixs[nzmask],return_inverse=1)
    nzsum = np.einsum('ji,jk->k',np.arange(len(unq))[:,None] == tags,mat[unq])
    return mat[0]*(len(ixs) - np.count_nonzero(nzmask)) + nzsum

Timings

Case #1 (ixs is 95% sparsey) :

In [301]: # Setup input
     ...: mat = np.random.rand(20,4)
     ...: ixs = np.random.randint(0,10,(100000))
     ...: ixs[np.random.rand(ixs.size)<0.95] = 0 # Make it approx 95% sparsey
     ...: 

In [302]: # Timings
     ...: %timeit org_indexing_app(mat,ixs)
     ...: %timeit org_bincount_app(mat,ixs)
     ...: %timeit indexing_modified_app(mat,ixs)
     ...: %timeit bincount_modified_app(mat,ixs)
     ...: %timeit simply_indexing_app(mat,ixs)
     ...: %timeit take_app(mat,ixs)
     ...: %timeit unq_mask_einsum_app(mat,ixs)
     ...: 
100 loops, best of 3: 4.89 ms per loop
1000 loops, best of 3: 428 µs per loop
100 loops, best of 3: 3.29 ms per loop
1000 loops, best of 3: 329 µs per loop
1000 loops, best of 3: 537 µs per loop
1000 loops, best of 3: 462 µs per loop
1000 loops, best of 3: 1.07 ms per loop

Case #2 (ixs is 98% sparsey) :

In [303]: # Setup input
     ...: mat = np.random.rand(20,4)
     ...: ixs = np.random.randint(0,10,(100000))
     ...: ixs[np.random.rand(ixs.size)<0.98] = 0 # Make it approx 98% sparsey
     ...: 

In [304]: # Timings
     ...: %timeit org_indexing_app(mat,ixs)
     ...: %timeit org_bincount_app(mat,ixs)
     ...: %timeit indexing_modified_app(mat,ixs)
     ...: %timeit bincount_modified_app(mat,ixs)
     ...: %timeit simply_indexing_app(mat,ixs)
     ...: %timeit take_app(mat,ixs)
     ...: %timeit unq_mask_einsum_app(mat,ixs)
     ...: 
100 loops, best of 3: 4.86 ms per loop
1000 loops, best of 3: 438 µs per loop
100 loops, best of 3: 3.5 ms per loop
1000 loops, best of 3: 260 µs per loop
1000 loops, best of 3: 318 µs per loop
1000 loops, best of 3: 288 µs per loop
1000 loops, best of 3: 694 µs per loop

Peter · Accepted Answer · 2016-10-11 10:48:04Z

0

After much number crunching (see Conclusions of original Question), the best-performing answer, when the inputs are defined as follows:

rng = np.random.RandomState(1234)
mat = rng.randn(1000, 500)
ixs = rng.choice(rng.randint(mat.shape[0], size=mat.shape[0]/10), size=1000)

Seems to be:

def sparse_bincount(mat, ixs):
    x = np.bincount(ixs)
    nonzeros, = np.nonzero(x)
    x[nonzeros].dot(mat[nonzeros])

answered Oct 11, 2016 at 10:48

Peter

13.8k11 gold badges82 silver badges99 bronze badges

Collectives™ on Stack Overflow

numpy: efficiently add rows of a matrix

Conclusions:

3 Answers 3

Comments

Benchmarking

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Conclusions:

3 Answers 3

Comments

Benchmarking

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related