7

I have unsorted array of indexes:

i = np.array([1,5,2,6,4,3,6,7,4,3,2])

I also have an array of values of the same length:

v = np.array([2,5,2,3,4,1,2,1,6,4,2])

I have array with zeros of desired values:

d = np.zeros(10)

Now I want to add to elements in d values of v based on it's index in i.

If I do it in plain python I would do it like this:

for index,value in enumerate(v):
    idx = i[index]
    d[idx] += v[index]

It is ugly and inefficient. How can I change it?

2 Answers 2

7
np.add.at(d, i, v)

You'd think d[i] += v would work, but if you try to do multiple additions to the same cell that way, one of them overrides the others. The ufunc.at method avoids those problems.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. It seems like it works. Is there a way to do it in 2d? when my d is 2d array and instead of i I have x and y array?
@MihailKondratyev: np.add.at(d, (x, y), v).
This version works nicely with values that are not Real numbers. The np.bincount(_, weights) variant will fail when weights has dtype(complex).
6

We can use np.bincount which is supposedly pretty efficient for such accumulative weighted counting, so here's one with that -

counts = np.bincount(i,v)
d[:counts.size] = counts

Alternatively, using minlength input argument and for a generic case when d could be any array and we want to add into it -

d += np.bincount(i,v,minlength=d.size).astype(d.dtype, copy=False)

Runtime tests

This section compares np.add.at based approach listed in the other post with the np.bincount based one listed earlier in this post.

In [61]: def bincount_based(d,i,v):
    ...:     counts = np.bincount(i,v)
    ...:     d[:counts.size] = counts
    ...: 
    ...: def add_at_based(d,i,v):
    ...:     np.add.at(d, i, v)
    ...:     

In [62]: # Inputs (random numbers)
    ...: N = 10000
    ...: i = np.random.randint(0,1000,(N))
    ...: v = np.random.randint(0,1000,(N))
    ...: 
    ...: # Setup output arrays for two approaches
    ...: M = 12000
    ...: d1 = np.zeros(M)
    ...: d2 = np.zeros(M)
    ...: 

In [63]: bincount_based(d1,i,v) # Run approaches
    ...: add_at_based(d2,i,v)
    ...: 

In [64]: np.allclose(d1,d2)  # Verify outputs
Out[64]: True

In [67]: # Setup output arrays for two approaches again for timing
    ...: M = 12000
    ...: d1 = np.zeros(M)
    ...: d2 = np.zeros(M)
    ...: 

In [68]: %timeit add_at_based(d2,i,v)
1000 loops, best of 3: 1.83 ms per loop

In [69]: %timeit bincount_based(d1,i,v)
10000 loops, best of 3: 52.7 µs per loop

3 Comments

It seems very efficient. Thank you. Your anwer is even better.
Can you show me an example with two dimensions. Where instead of "i' I have "x" and "y", and "d" is two-dimensional array?
@MihailKondratyev I would suggest posting that as a separate question as that's different from what you have in the question. Good luck!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.