Numpy sum elements in array based on its value

Question

I have unsorted array of indexes:

i = np.array([1,5,2,6,4,3,6,7,4,3,2])

I also have an array of values of the same length:

v = np.array([2,5,2,3,4,1,2,1,6,4,2])

I have array with zeros of desired values:

d = np.zeros(10)

Now I want to add to elements in d values of v based on it's index in i.

If I do it in plain python I would do it like this:

for index,value in enumerate(v):
    idx = i[index]
    d[idx] += v[index]

It is ugly and inefficient. How can I change it?

user2357112 · Accepted Answer · 2015-11-20 02:47:49Z

7

np.add.at(d, i, v)

You'd think d[i] += v would work, but if you try to do multiple additions to the same cell that way, one of them overrides the others. The ufunc.at method avoids those problems.

answered Nov 20, 2015 at 2:47

user2357112

286k32 gold badges490 silver badges570 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mihail Kondratyev Over a year ago

Thanks. It seems like it works. Is there a way to do it in 2d? when my d is 2d array and instead of i I have x and y array?

user2357112 Over a year ago

@MihailKondratyev: np.add.at(d, (x, y), v).

Matt Eding Over a year ago

This version works nicely with values that are not Real numbers. The np.bincount(_, weights) variant will fail when weights has dtype(complex).

Divakar · Accepted Answer · 2017-11-22 16:13:48Z

6

We can use np.bincount which is supposedly pretty efficient for such accumulative weighted counting, so here's one with that -

counts = np.bincount(i,v)
d[:counts.size] = counts

Alternatively, using minlength input argument and for a generic case when d could be any array and we want to add into it -

d += np.bincount(i,v,minlength=d.size).astype(d.dtype, copy=False)

Runtime tests

This section compares np.add.at based approach listed in the other post with the np.bincount based one listed earlier in this post.

In [61]: def bincount_based(d,i,v):
    ...:     counts = np.bincount(i,v)
    ...:     d[:counts.size] = counts
    ...: 
    ...: def add_at_based(d,i,v):
    ...:     np.add.at(d, i, v)
    ...:     

In [62]: # Inputs (random numbers)
    ...: N = 10000
    ...: i = np.random.randint(0,1000,(N))
    ...: v = np.random.randint(0,1000,(N))
    ...: 
    ...: # Setup output arrays for two approaches
    ...: M = 12000
    ...: d1 = np.zeros(M)
    ...: d2 = np.zeros(M)
    ...: 

In [63]: bincount_based(d1,i,v) # Run approaches
    ...: add_at_based(d2,i,v)
    ...: 

In [64]: np.allclose(d1,d2)  # Verify outputs
Out[64]: True

In [67]: # Setup output arrays for two approaches again for timing
    ...: M = 12000
    ...: d1 = np.zeros(M)
    ...: d2 = np.zeros(M)
    ...: 

In [68]: %timeit add_at_based(d2,i,v)
1000 loops, best of 3: 1.83 ms per loop

In [69]: %timeit bincount_based(d1,i,v)
10000 loops, best of 3: 52.7 µs per loop

edited Nov 22, 2017 at 16:13

answered Nov 20, 2015 at 7:25

Divakar

222k19 gold badges273 silver badges374 bronze badges

3 Comments

Mihail Kondratyev Over a year ago

It seems very efficient. Thank you. Your anwer is even better.

Mihail Kondratyev Over a year ago

Can you show me an example with two dimensions. Where instead of "i' I have "x" and "y", and "d" is two-dimensional array?

Divakar Over a year ago

@MihailKondratyev I would suggest posting that as a separate question as that's different from what you have in the question. Good luck!

Collectives™ on Stack Overflow

Numpy sum elements in array based on its value

2 Answers 2

3 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related