1

So I have a (seemingly) simple problem, which I am currently doing now via a for loop.

Basically, I want to increment specific cells in a numpy matrix, but I want to do it without a for-loop if possible.

To give more details: I have 100 x 100 numpy matrix, X. I also have a 2x1000 numpy matrix P. P just stores indices into X, so for example, each column of P, has the row-column index of the cell, that I want to increment in X.

What I do right now is this:

for p in range(P.shape[1]):
  X[P[0,p], P[1,p]] += 1

My question is, is there a way to do this without a for-loop?

Thanks!

2
  • 1
    for p in range(P.shape[1]): probably? Commented Sep 13, 2017 at 16:58
  • @Divakar yes sorry, corrected the typo! Commented Sep 13, 2017 at 21:25

2 Answers 2

3

Use the at method of the add ufunc with advanced indexing:

numpy.add.at(X, (P[0], P[1]), 1)

or just advanced indexing if P is guaranteed to never select the same cell of X twice:

X[P[0], P[1]] += 1
Sign up to request clarification or add additional context in comments.

8 Comments

P[0], P[1] can be expressed more generally as (*P,). So numpy.add.at(X, (*P,), 1) and X[(P,)] += 1. This is nice because it does not hard code in the number of dimensions.
@MadPhysicist: Or just tuple(P), but I picked P[0], P[1] because it felt clearer to me for this case.
Hi, it is very possible to have repeated indices. Does this mean that I should use the first option? Second question is, why is the first option better than a for loop? Thanks!
@TheGrapeBeyond. Python for loops are not great for speed. Functionally, the loop is no different than using add.at, but it will be much slower and require more text to type.
@TheGrapeBeyond: That's what this is, so assuming you write a similar wrapper and similar C-level code, then yes.
|
2

Using linear-indices and bincount -

lidx = np.ravel_multi_index(P, X.shape)
X += np.bincount(lidx, minlength=X.size).reshape(X.shape)

Benchmarking

For the case when indices are not repeated, advanced indexing based approach as suggested in @user2357112's post seems to be very efficient.

For repeated ones case, we have np.add.at and np.bincount and the performance numbers seem to be dependent on the size of indices array relative to the size of input array.

Approaches -

 def app0(X,P): # @user2357112's soln1
     np.add.at(X, (P[0], P[1]), 1)

 def app1(X, P): # Proposed in this ppst
     lidx = np.ravel_multi_index(P, X.shape)
     X += np.bincount(lidx, minlength=X.size).reshape(X.shape)

Here's few timing tests to suggest that -

Case #1 :

 In [141]: X = np.random.randint(0,9,(100,100))
      ...: P = np.random.randint(0,100,(2,1000))
      ...: 

 In [142]: %timeit app0(X, P)
      ...: %timeit app1(X, P)
      ...: 
 10000 loops, best of 3: 68.9 µs per loop
 100000 loops, best of 3: 15.1 µs per loop

Case #2 :

 In [143]: X = np.random.randint(0,9,(1000,1000))
      ...: P = np.random.randint(0,1000,(2,10000))
      ...: 

 In [144]: %timeit app0(X, P)
      ...: %timeit app1(X, P)
      ...: 
 1000 loops, best of 3: 687 µs per loop
 1000 loops, best of 3: 1.48 ms per loop

Case #3 :

 In [145]: X = np.random.randint(0,9,(1000,1000))
      ...: P = np.random.randint(0,1000,(2,100000))
      ...: 

 In [146]: %timeit app0(X, P)
      ...: %timeit app1(X, P)
      ...: 
 100 loops, best of 3: 11.3 ms per loop
 100 loops, best of 3: 2.51 ms per loop

1 Comment

Took me a while to realize why you were doing that. Repeated indices are a pain.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.