11

Maybe has been asked before, but I can't find it. Sometimes I have an index I, and I want to add successively accordingly to this index to an numpy array, from another array. For example:

A = np.array([1,2,3])
B = np.array([10,20,30])
I = np.array([0,1,1])
for i in range(len(I)):
    A[I[i]] += B[i]
print(A)

prints the expected (correct) value:

[11 52  3]

while

A[I] += B
print(A)

results in the expected (wrong) answer

[11 32  3].

Is there any way to do what I want in a vectorized way, without the loop? If not, which is the fastest way to do this?

2
  • It would be a bit churlish to close this question as a dupe when there are two good answers below, so here are a couple of places where this question has been asked before in case they're a useful reference: one two Commented Feb 27, 2018 at 13:06
  • 1
    @AlexRiley Found a better dup target. Commented Feb 27, 2018 at 13:24

2 Answers 2

12

Use numpy.add.at:

>>> import numpy as np
>>> A = np.array([1,2,3])
>>> B = np.array([10,20,30])
>>> I = np.array([0,1,1])
>>> 
>>> np.add.at(A, I, B)
>>> A
array([11, 52,  3])

Alternatively, np.bincount:

>>> A = np.array([1,2,3])
>>> B = np.array([10,20,30])
>>> I = np.array([0,1,1])
>>> 
>>> A += np.bincount(I, B, minlength=A.size).astype(int)
>>> A
array([11, 52,  3])

Which is faster?

Depends. In this concrete example add.at seems marginally faster, presumably because we need to convert types in the bincount solution.

If OTOH A and B were float dtype then bincount would be faster.

Sign up to request clarification or add additional context in comments.

7 Comments

Why would bincount be faster in the when I have float? how much faster?
@Johan As I said it depends. For some reason bincount returns dtype float even if the input weights are int, so we have to cast back to int to be able to add to A which makes it slower. If A and B are float anyway, this is not necessary. add.at is normally only faster if the number of things to add is quite a bit smaller than the number of elements in the destination array. With your example converted to float the difference is not large, on my computer something like 10%.
I really liked the idea with bincount, but in my case I actually have numpy arrays (matrices) and I would like to do the adding of columns. This seems to work with numpy.add.at. However this is quite slow. But for bincount it gives the error: object too deep for desired array. Do you know if there is a multidimensional version?
@Johan Not aware of one, I'm afraid.
@Johan I think one can make it work, but that should be a separate question, not hidden under all that other stuff here.
|
8

You need to use np.add.at:

A = np.array([1,2,3])
B = np.array([10,20,30])
I = np.array([0,1,1])

np.add.at(A, I, B)
print(A)

prints

array([11, 52, 3])

This is noted in the doc:

ufunc.at(a, indices, b=None)

Performs unbuffered in place operation on operand ‘a’ for elements specified by ‘indices’. For addition ufunc, this method is equivalent to a[indices] += b, except that results are accumulated for elements that are indexed more than once. For example, a[[0,0]] += 1 will only increment the first element once because of buffering, whereas add.at(a, [0,0], 1) will increment the first element twice.

Comments