I have a matrix.
mat = array([
[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]
])
I'd like to get the sum of the rows at certain indices: eg.
ixs = np.array([0,2,0,0,0,1,1])
I know I can compute the answer as:
mat[ixs].sum(axis=0)
> array([16, 23, 30, 37])
The problem is ixs may be very long, and I don't want to use all the memory to create the intermediate product mat[ixs], only to reduce it again with the sum.
I also know I could simply count up the indices and use multiplication instead.
np.bincount(ixs, minlength=mat.shape[0).dot(mat)
> array([16, 23, 30, 37])
But that will be expensive if my ixs are sparse.
I know about scipy's sparse matrices, and I suppose I could use them, but I'd prefer a pure numpy solution as sparse matrices are limited in various ways (such as only being 2-d)
So, is there a pure numpy way to merge the indexing and sum-reduction in this case?
Conclusions:
Thanks you Divakar and hpaulj for your very thorough responses. By "sparse" I meant that most of the values in range(w.shape[0]) are not in ixs. Using that new definition (and with more realisitic data size, I re-ran Divakar tests, with some new funcitona dded :
rng = np.random.RandomState(1234)
mat = rng.randn(1000, 500)
ixs = rng.choice(rng.randint(mat.shape[0], size=mat.shape[0]/10), size=1000)
# Divakar's solutions
In[42]: %timeit org_indexing_app(mat, ixs)
1000 loops, best of 3: 1.82 ms per loop
In[43]: %timeit org_bincount_app(mat, ixs)
The slowest run took 4.07 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 177 µs per loop
In[44]: %timeit indexing_modified_app(mat, ixs)
1000 loops, best of 3: 1.81 ms per loop
In[45]: %timeit bincount_modified_app(mat, ixs)
1000 loops, best of 3: 258 µs per loop
In[46]: %timeit simply_indexing_app(mat, ixs)
1000 loops, best of 3: 1.86 ms per loop
In[47]: %timeit take_app(mat, ixs)
1000 loops, best of 3: 1.82 ms per loop
In[48]: %timeit unq_mask_einsum_app(mat, ixs)
10 loops, best of 3: 58.2 ms per loop
# hpaulj's solutions
In[53]: %timeit hpauljs_sparse_solution(mat, ixs)
The slowest run took 9.34 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 524 µs per loop
%timeit hpauljs_second_sparse_solution(mat, ixs)
100 loops, best of 3: 9.91 ms per loop
# Sparse version of original bincount solution (see below):
In[60]: %timeit sparse_bincount(mat, ixs)
10000 loops, best of 3: 71.7 µs per loop
The winner in this case is the sparse version of the bincount solution.
def sparse_bincount(mat, ixs):
x = np.bincount(ixs)
nonzeros, = np.nonzero(x)
x[nonzeros].dot(mat[nonzeros])
ixs, do you mean lots ofzeorsin it or something else?range(w.shape[0])are not in ixsnp.bincount(ixs, minlength=mat.shape[0)can be sparsebincount_modified_appapproach posted in the solution.