Vectorize iterative addition in NumPy arrays

Question

For each element in a randomized array of 2D indices (with potential duplicates), I want to "+=1" to the corresponding grid in a 2D zero array. However, I don't know how to optimize the computation. Using the standard for loop, as shown here,

def interadd():
    U = 100 
    input = np.random.random(size=(5000,2)) * U
    idx = np.floor(input).astype(np.int) 

    grids = np.zeros((U,U))      
    for i in range(len(input)):
        grids[idx[i,0],idx[i,1]] += 1
    return grids

The runtime can be quite significant:

>> timeit(interadd, number=5000)
43.69953393936157

Is there a way to vectorize this iterative process?

DSM · Accepted Answer · 2015-06-28 02:40:58Z

9

You could speed it up a little by using np.add.at, which correctly handles the case of duplicate indices:

def interadd(U, idx):
    grids = np.zeros((U,U))      
    for i in range(len(idx)):
        grids[idx[i,0],idx[i,1]] += 1
    return grids

def interadd2(U, idx):
    grids = np.zeros((U,U))
    np.add.at(grids, idx.T.tolist(), 1)
    return grids

def interadd3(U, idx):
    # YXD suggestion
    grids = np.zeros((U,U))
    np.add.at(grids, (idx[:,0], idx[:,1]), 1)
    return grids

which gives

>>> U = 100
>>> idx = np.floor(np.random.random(size=(5000,2))*U).astype(np.int)
>>> (interadd(U, idx) == interadd2(U, idx)).all()
True
>>> %timeit interadd(U, idx)
100 loops, best of 3: 8.48 ms per loop
>>> %timeit interadd2(U, idx)
100 loops, best of 3: 2.62 ms per loop

And YXD's suggestion:

>>> (interadd(U, idx) == interadd3(U, idx)).all()
True
>>> %timeit interadd3(U, idx)
1000 loops, best of 3: 1.09 ms per loop

edited Jun 28, 2015 at 2:40

answered Jun 27, 2015 at 23:07

DSM

355k67 gold badges606 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

YXD Over a year ago

Was typing almost the same thing.You can change idx.T.tolist() to (idx[:, 0], idx[:, 1]), should be faster still.

user2379410 · Accepted Answer · 2015-06-28 14:47:15Z

6

Divakar's answer lead me to try the following, which looks to be the fastest method yet:

lin_idx = idx[:,0]*U + idx[:,1]
grids = np.bincount(lin_idx, minlength=U**2).reshape(U, U)

Timings:

In [184]: U = 100 
     ...: input = np.random.random(size=(5000,2)) * U
     ...: idx = np.floor(input).astype(np.int)

In [185]: %timeit interadd3(U, idx)  # By DSM / XYD
1000 loops, best of 3: 1.68 ms per loop

In [186]: %timeit unique_counts(U, idx)  # By Divakar
1000 loops, best of 3: 676 µs per loop

In [187]: %%timeit
     ...: lin_idx = idx[:,0]*U + idx[:,1]
     ...: grids = np.bincount(lin_idx, minlength=U*U).reshape(U, U)
     ...: 
10000 loops, best of 3: 97.5 µs per loop

answered Jun 28, 2015 at 14:47

user2379410

1 Comment

Divakar Over a year ago

Huge improvement it seems!

Community · Accepted Answer · 2017-05-23 10:28:54Z

5

You can convert R,C indices from idx to linear indices, then find out the unique ones alongwith their counts and finally store them in the output grids as the final output. Here's the implementation to achieve the same -

# Calculate linear indices corressponding to idx
lin_idx = idx[:,0]*U + idx[:,1]

# Get unique linear indices and their counts
unq_lin_idx,idx_counts = np.unique(lin_idx,return_counts=True)

# Setup output array and store index counts in raveled/flattened version
grids = np.zeros((U,U))  
grids.ravel()[unq_lin_idx] = idx_counts

Runtime tests -

Here are the runtime tests covering all the approaches (including @DSM's approaches) and using the same definitions as listed in that solution -

In [63]: U = 100
    ...: idx = np.floor(np.random.random(size=(5000,2))*U).astype(np.int)
    ...: 

In [64]: %timeit interadd(U, idx)
100 loops, best of 3: 7.57 ms per loop

In [65]: %timeit interadd2(U, idx)
100 loops, best of 3: 2.59 ms per loop

In [66]: %timeit interadd3(U, idx)
1000 loops, best of 3: 1.24 ms per loop

In [67]: def unique_counts(U, idx):
    ...:     lin_idx = idx[:,0]*U + idx[:,1]
    ...:     unq_lin_idx,idx_counts = np.unique(lin_idx,return_counts=True)
    ...:     grids = np.zeros((U,U))  
    ...:     grids.ravel()[unq_lin_idx] = idx_counts
    ...:     return grids
    ...: 

In [68]: %timeit unique_counts(U, idx)
1000 loops, best of 3: 595 µs per loop

Runtimes suggest that the proposed np.unique based approach is more than 50% faster than the second fastest approach.

edited May 23, 2017 at 10:28

CommunityBot

11 silver badge

answered Jun 28, 2015 at 5:57

Divakar

222k19 gold badges273 silver badges374 bronze badges

4 Comments

user2379410 Over a year ago

np.unique uses sorting under the hood, so has worse time complexity than np.add.at, but on the other hand your approach has a faster memory access pattern for the grids array.

Divakar Over a year ago

@moarningsun Yeah I think it uses sorting and differentiation under the hood. That makes sense I guess on that faster runtimes. Would be interesting to find out what's going underneath with add.at.

user2379410 Over a year ago

This made me think of an interesting approach: grids = np.bincount(lin_idx, minlength=U**2).reshape(U, U)

Divakar Over a year ago

@moarningsun bincount is surely another way to look at add.at and probably might be faster. Post a solution with it and see if the timings are better with it?

Collectives™ on Stack Overflow

Vectorize iterative addition in NumPy arrays

3 Answers 3

1 Comment

1 Comment

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related