Most efficient way of adding elements given the index list in numpy

Question

Assume we have a numpy array A with shape (N, ) and a matrix D with shape (M, 3) which has data and another matrix I with shape (M, 3) which has corresponding index of each data element in D. How can we construct A given D and I such that the repeated element indexes are added?

Example:

############# A[I] := D ###################################  
A = [0.5, 0.6]                         # Final Reduced Data Vector
D = [[0.1, 0.1 0.2], [0.2, 0.4, 0.1]]  # Data
I = [[0, 1, 0], [0, 1, 1]]             # Indices

For example:

A[0] = D[0][0] + D[0][2] + D[1][0]     # 0.5 = 0.1 + 0.2 + 0.2

Since in index matrix we have:

I[0][0] = I[0][2] = I[1][0] = 0

Target is to avoid looping over all elements to be efficient for large N, M (10^6-10^9).

Question needs more clarification. What do you mean "repeated element indexes are added"? Are you getting 0.5 by doing 0.1+0.4? If so how are you getting 0.6? — Ananda
– Ananda, Commented Dec 2, 2020 at 6:36
M does not matter though - what MIGHT matter is the number of integers in I — Leonardus Chen
– Leonardus Chen, Commented Dec 6, 2020 at 21:13

Leonardus Chen · Accepted Answer · 2020-12-06 21:08:33Z

6

+50

I doubt you can get much faster than np.bincount - and notice how the official documentation provides this exact usecase

# Your example
A = [0.5, 0.6]
D = [[0.1, 0.1, 0.2], [0.2, 0.4, 0.1]]
I = [[0, 1, 0], [0, 1, 1]]

# Solution
import numpy as np    
D, I = np.array(D).flatten(), np.array(I).flatten()
print(np.bincount(I, D)) #[0.5 0.6]

answered Dec 6, 2020 at 21:08

Leonardus Chen

1,36810 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Leonardus Chen Over a year ago

On a second look, using ravel instead of flatten might give us a slight boost in performance here, but I don't think that's very relevant relative to the actual binning and counting phase.

Mad Physicist · Accepted Answer · 2020-12-02 08:37:14Z

3

The shape of I and D doesn't matter: you can clearly ravel the arrays without changing the outcome:

index = np.ravel(I)
data = np.ravel(D)

Now you can sort both arrays according to I:

sorter = np.argsort(index)
index = index[sorter]
data = data[sorter]

This is helpful because now index looks like this:

0, 0, 0, 1, 1, 1

And data is this:

0.1, 0.2, 0.2, 0.1, 0.4, 0.1

Adding together runs of consecutive numbers should be easier than processing random locations. Let's start by finding the indices where the runs start:

runs = np.r_[0, np.flatnonzero(np.diff(index)) + 1]

Now you can use the fact that ufuncs like np.add have a partial reduce operation called reduceat. This allows you to sum regions of an array:

a = np.add.reduceat(data, runs)

If I is guaranteed to contain all indices in [0, A.size) at least once, you're done: just assign to A instead of a. If not, you can make the mapping using the fact that the start of each run in index is the target index:

A = np.zeros(n)
A[index[runs]] = a

Algorithmic complexity analysis:

ravel is O(1) in time and space if the data is in an array. If it's a list, this is O(MN) in time and space
argsort is O(MN log MN) in time and O(MN) in space
Indexing by sorter is O(MN) in time and space
Computing runs is O(MN) in time and O(MN + M) = O(MN) in space
reduceat is a single pass: O(MN) in time, O(M) in space
Reassigning A is O(M) in time and space

Total: O(MN log MN) time, O(MN) space

TL;DR

def make_A(D, I, M):
    index = np.ravel(I)
    data = np.ravel(D)
    sorter = np.argsort(index)
    index = index[sorter]

    if index[0] < 0 or index[-1] >= M:
        raise ValueError('Bad indices')

    data = data[sorter]
    runs = np.r_[0, np.flatnonzero(np.diff(index)) + 1]
    a = np.add.reduceat(data, runs)
    if a.size == M:
        return a
    A = np.zeros(M)
    A[index[runs]] = a
    return A

edited Dec 2, 2020 at 8:37

answered Dec 2, 2020 at 8:10

Mad Physicist

116k29 gold badges202 silver badges292 bronze badges

3 Comments

Roy Over a year ago

Thank you! This is very smart way to do it, I would just wait few days in case, someone finds more efficient algorithm.

Mad Physicist Over a year ago

Take your time. You theoretically do this faster using a hash table, since that has O(1) lookup time, but I suspect your data isn't large enough for the savings in complexity to overtake the overhead of python function calls.

Mad Physicist Over a year ago

@Roy. Made a couple of typo fixes. Enjoy

Dani Mesejo · Accepted Answer · 2020-12-10 08:30:12Z

1

If you know the size of A beforehand, as it seems you do, you can simply use add.at:

import numpy as np

D = [[0.1, 0.1, 0.2], [0.2, 0.4, 0.1]]
I = [[0, 1, 0], [0, 1, 1]]

arr_D = np.array(D)
arr_I = np.array(I)

A = np.zeros(2)

np.add.at(A, arr_I, arr_D)

print(A)

Output

[0.5 0.6]

If you don't know the size of A, you can use max to compute it:

A = np.zeros(arr_I.max() + 1)
np.add.at(A, arr_I, arr_D)
print(A)

Output

[0.5 0.6]

The time complexity of this algorithm is O(N), with also space complexity O(N).

The:

arr_I.max() + 1

is what bincount does under the hood, from the documentation:

The result of binning the input array. The length of out is equal to np.amax(x)+1.

That being said, bincount is at least one order of magnitude faster:

I = np.random.choice(1000, size=(1000, 3), replace=True)
D = np.random.random((1000, 3))
%timeit make_A_with_at(I, D, 1000)
213 µs ± 25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit make_A_with_bincount(I, D)
11 µs ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

edited Dec 10, 2020 at 8:30

answered Dec 10, 2020 at 7:47

Dani Mesejo

62.1k6 gold badges56 silver badges86 bronze badges

1 Comment

Roy Over a year ago

Thanks for detailed explanation and timing.

Collectives™ on Stack Overflow

Most efficient way of adding elements given the index list in numpy

3 Answers 3

1 Comment

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related