How to covert replace numpy loop with faster broadcast operation

Question

I am trying to use broadcasting to speed up my numpy code. the real code has much larger arrays and loops through multiple times, but I think this snippet illustrates the issue.

import numpy as np
row    = np.array([0,0,1,1,4])
dl_ddk = np.array([0,8,29,112,11])
change1 = np.zeros(5)
change2 = np.zeros(5)
for k in range(0, row.shape[0]):
   i          = row[k]
   change1[i] += dl_ddk[k]
change2[row] += dl_ddk
print(change1)
print(change2)

change1 = [8, 141, 0, 0 11] change2 = [8, 112, 0, 0 11]

I thought these two change arrays would be equals however, it seems that the broadcast operations += is overwriting rather than adding values. Is there a way to vectorize a loop in np with matrix referencing like this that will give the same results as change1?

np.add.at is designed to get around this buffering issue.

hpaulj
– hpaulj

2020-07-18 19:48:28 +00:00
Commented Jul 18, 2020 at 19:48 — hpaulj
– hpaulj, Commented Jul 18, 2020 at 19:48

Mark · Accepted Answer · 2020-07-18 16:11:38Z

1

You can use np.bincount() and use dl_ddk as the weights:

import numpy as np

row    = np.array([0,0,1,1,4])
dl_ddk = np.array([0,8,29,112,11])

change1 = np.bincount(row, weights=dl_ddk)
print(change1)
# [  8. 141.   0.   0.  11.]

The bit in the docs show using it in a way almost exactly like your problem:

If weights is specified the input array is weighted by it, i.e. if a value n is found at position i, out[n] += weight[i] instead of out[n] += 1.

answered Jul 18, 2020 at 16:11

Mark

92.6k8 gold badges116 silver badges156 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

hpaulj · Accepted Answer · 2020-07-18 21:26:18Z

In [1]: row    = np.array([0,0,1,1,4]) 
   ...: dl_ddk = np.array([0,8,29,112,11]) 
   ...: change1 = np.zeros(5) 
   ...: change2 = np.zeros(5) 
   ...: for k in range(0, row.shape[0]): 
   ...:    i          = row[k] 
   ...:    change1[i] += dl_ddk[k] 
   ...: change2[row] += dl_ddk

change2 does not match because of buffering. ufunc has added a at method to address this:

Performs unbuffered in place operation on operand 'a' for elements specified by 'indices'.           
                                                           
In [3]: change3 = np.zeros(5)                                                                        
In [4]: np.add.at(change3, row, dl_ddk)                                                              
In [5]: change1                                                                                      
Out[5]: array([  8., 141.,   0.,   0.,  11.])
In [6]: change2                                                                                      
Out[6]: array([  8., 112.,   0.,   0.,  11.])
In [7]: change3                                                                                      
Out[7]: array([  8., 141.,   0.,   0.,  11.])

Collectives™ on Stack Overflow

How to covert replace numpy loop with faster broadcast operation

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related