Efficient cummulative sum in Python

Question

I have a vector a of known size N, so that np.sum(a) is 1, and np.all(a>=0) is true. I want to determine the minimum number of entries that sum a threshold t. For instance, I would do something like:

idx = np.argsort(a)
asorted = a[idx][::-1]
sum_ = 0
number = 0
while sum_ < t:
    number += 1
    sum_ = np.sum(asorted[:number])

as soon as sum_ is greater than t, the program stops, and the variable number tells me the minimum number of entries that sum that threshold.

I am looking for the most efficient way of getting this number, as I have to perform this operation millions of times.

Why do you do sum_ = np.sum(a[:number]) instead of sum_ += a[number - 1]? — Kelly Bundy
– Kelly Bundy, Commented Mar 12, 2020 at 19:59
Do you have to answer the millions of operations one at a time, or can you do them all together (i.e., you get a list of thresholds and return a list of results)? — Kelly Bundy
– Kelly Bundy, Commented Mar 12, 2020 at 20:09
@HeapOverflow one at a time, N can be of the order of 4^8- 4^12, the threshold is actually the same, but I have to first load the vectors a, and I cannot load them all because of memory issues. — Vladimir Vargas
– Vladimir Vargas, Commented Mar 12, 2020 at 20:32
Perhaps sum_ = np.sum(a[:number]) is meant to be sum_ = np.sum(asorted[:number])? — hilberts_drinking_problem
– hilberts_drinking_problem, Commented Mar 12, 2020 at 21:17

norok2 · Accepted Answer · 2020-03-13 11:05:05Z

4

(EDITED)

(EDIT2: added a more specialized JIT-ed version to address issues when using np.sort() with numba.)

(EDIT3: included timing for the recursive approach with median pivoting from @hilberts_drinking_problem's answer)

I am not 100% what you are after, because the first two lines of your code seems to be doing nothing, but after @hilberts_drinking_problem I have edited my answer, I assume that you have a typo and:

sum_ = np.sum(arr[:i])

should be:

sum_ = np.sum(asorted[:i])

Then, your solution can be written as a function like:

import numpy as np


def min_sum_threshold_orig(arr, threshold=0.5):
    idx = np.argsort(arr)
    arr_sorted = arr[idx][::-1]
    sum_ = 0
    i = 0
    while sum_ < threshold:
        i += 1
        sum_ = np.sum(arr_sorted[:i])
    return i

However:

Instead of np.argsort() and indexing you could use np.sort() directly
there is no need to compute the entire sum at each iteration, but you could instead use the sum from the previous iteration
Using a while loop is risky because if threshold is sufficiently high (> 1.0 with your assumption) then the loop will never end

Addressing these points one gets to:

def min_sum_threshold(arr, threshold=0.5):
    arr = np.sort(arr)[::-1]
    sum_ = 0
    for i in range(arr.size):
        sum_ += arr[i]
        if sum_ >= threshold:
            break
    return i + 1

In the above, the explicit looping becomes the bottleneck. A good way of addressing it, is to use numba:

import numba as nb


min_sum_threshold_nbn = nb.jit(min_sum_threshold)
min_sum_threshold_nbn.__name__ = 'min_sum_threshold_nbn'

But this may not be the most efficient approach as numba is relatively slow when new arrays are being created. A possibly faster approach is to use arr.sort() instead of np.sort() because that is in-place, thus avoiding the creation of a new array:

@nb.jit
def min_sum_thres_nb_inplace(arr, threshold=0.5):
    arr.sort()
    sum_ = 0
    for i in range(arr.size - 1, -1, -1):
        sum_ += arr[i]
        if sum_ >= threshold:
            break
    return arr.size - i

Alternatively, one could JIT only the portion of code after the sorting:

@nb.jit
def _min_sum_thres_nb(arr, threshold=0.5):
    sum_ = 0.0
    for i in range(arr.size):
        sum_ += arr[i]
        if sum_ >= threshold:
            break
    return i + 1


def min_sum_thres_nb(arr, threshold=0.5):
    return _min_sum_thres_nb(np.sort(arr)[::-1], threshold)

The difference between the two will be minimal for larger inputs. For smaller one, min_sum_thres_nb() will be dominated by the comparatively slow extra function call. Because of the pitfails in benchmarking functions which modify their input, min_sum_thres_nb_inplace() is omitted from benchmarks, with the understanding that for very small inputs is as fast as min_sum_thres_nbn() and for larger ones it has essentially the same performances as min_sum_thres_nb().

Alternatively one could use a vectorized approaches like in @yatu's answer:

def min_sum_threshold_np_sum(arr, threshold=0.5):
    return np.sum(np.cumsum(np.sort(arr)[::-1]) < threshold) + 1

or, better, use np.searchsorted() which avoids creating an unnecessary temporary array with the comparison:

def min_sum_threshold_np_ss(arr, threshold=0.5):
    return np.searchsorted(np.cumsum(np.sort(arr)[::-1]), threshold) + 1

or, assuming that sorting the entire array is unnecessarily expensive:

def min_sum_threshold_np_part(arr, threshold=0.5):
    n = arr.size
    m = np.int(size * threshold) + 1
    part_arr = np.partition(arr, n - m)[n - m:]
    return np.searchsorted(np.cumsum(np.sort(arr)[::-1]), threshold) + 1

An even more sophisticated approach using recursion and median pivoting is:

def min_sum_thres_rec(arr, threshold=0.5, cutoff=64):
    n = arr.size
    if n <= cutoff:
        return np.searchsorted(np.cumsum(np.sort(arr)[::-1]), threshold) + 1
    else:
        m = n // 2
        partitioned = np.partition(arr, m)
        low = partitioned[:m]
        high = partitioned[m:]
        sum_high = np.sum(high)
        if sum_high >= threshold:
            return min_sum_thres_rec(high, threshold)
        else:
            return min_sum_thres_rec(low, threshold - sum_high) + high.size

(the last three are adapted from @hilberts_drinking_problem's answer)

Benchmarking these with the input generated from this:

def gen_input(n, a=0, b=10000):
    arr = np.random.randint(a, b, n)
    arr = arr / np.sum(arr)
    return arr

gives the following:

These indicate that for small enough inputs, the numba approach is the fastest, but as soon as the input exceeds ~600 elements for the naive approach or ~900 for the optimized one, the NumPy approaches which uses np.partition(), while being less memory efficient, are faster.

Eventually, beyond ~4000 elements, the min_sum_thres_rec() gets faster than all other proposed methods. It may be possible to write a faster numba-based implementation of this method.

Note that the optimized numba approach is strictly faster than the naive NumPy approaches that were tested.

edited Mar 13, 2020 at 11:05

answered Mar 12, 2020 at 20:56

norok2

27.1k6 gold badges83 silver badges110 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

hilberts_drinking_problem Over a year ago

I believe the OP wants to find minimum number of elements summing to t. If a is unsorted, your find_cumsum_threshold function is not guaranteed to give correct output. I am guessing that there was a typo in the question, and sum_ = np.sum(a[:number]) is meant to be sum_ = np.sum(asorted[:number]).

Vladimir Vargas Over a year ago

I like the use of numba here. I will check if this is faster than the answer by @hilberts_drinking_problem

Kelly Bundy Over a year ago

You never use part_arr. And something curious: In min_sum_threshold, when I return right after the first line (the sorting), the non-Numba version becomes significantly faster than the Numba version, and the Numba version remains around the same time (so apparently the sorting takes almost all the time and the loop takes almost none). If I do the sorting before calling those two functions, the Numba version is about 800 times faster than the non-Numba version.

hilberts_drinking_problem Over a year ago

@norok2 I added a linear time algorithm, if you feel like updating the benchmarks in the future ;)

norok2 Over a year ago

@HeapOverflow I have included a version which addresses the problem with np.sort(), which is essentially that numba gets slower with allocation of new memory, hence either in-place operations must be used from NumPy OR the memory allocation is done outside numba. @hilbert_drinking_problem cool! I have already updated it.

hilberts_drinking_problem · Accepted Answer · 2020-03-13 01:38:51Z

3

It just occurred to me that there is a recursive linear time algorithm for this based on median pivoting:

def min_sum_rec(arr, t=0.5):
  n = arr.size
  # default to sorting for small arrays
  if n <= 100:
    return np.searchsorted(np.sort(arr)[::-1].cumsum(), t) + 1
  partitioned = np.partition(arr, n//2)
  low = partitioned[:n//2]
  high = partitioned[n//2:]
  sum_high = high.sum()
  if sum_high >= t:
    return min_sum_rec(high, t)
  else:
    return min_sum_rec(low, t - sum_high) + high.size

Here is comparison to my previous O(n log(n)) solution, in seconds:

N           min_sum_rec  num_to_t
10            0.000041  0.000038
100           0.000025  0.000028
1000          0.000086  0.000042
10000         0.000321  0.000310
100000        0.002547  0.003259
1000000       0.028826  0.039854
10000000      0.247731  0.431744
100000000     2.371766  4.800107

Previous solution, which can be faster for smaller-sized arrays:

In addition to using cumsum, note that the average element of the array has size 1/N. Hence, it takes at most t*N elements to add up to t. For small t, this provides an optimization opportunity where we make an O(N) call to np.partition, followed by sorting of t*N largest elements:

import numpy as np
np.random.seed(0)

a = np.random.rand(10**6)
a /= a.sum()
t = 1e-3


def num_to_t_sort(a, t):
  c = np.sort(a)[::-1].cumsum()
  return np.searchsorted(c, t) + 1


def num_to_t(a, t):
  n = len(a)
  m = np.int(n * t) + 1
  b = np.partition(a, n-m)[n-m:]
  b[::-1].sort()
  c = b.cumsum()
  return np.searchsorted(c, t) + 1


assert num_to_t(a, t) == num_to_t_sort(a, t)
%timeit num_to_t(a, t)      # 100 loops, best of 3: 11.8 ms per loop
%timeit num_to_t_sort(a, t) # 10 loops, best of 3: 107 ms per loop

A similar optimization applies if t tends to be close to 1. If you are repeating operations for the same array and different t, you are probably better of storing c = np.sort(a)[::-1].cumsum() and calling np.searchsorted for each t.

Also, I am assuming that every element is strictly positive. Otherwise, one needs to consider two separate cases depending on whether t occurs in c.

edited Mar 13, 2020 at 1:38

answered Mar 12, 2020 at 20:44

hilberts_drinking_problem

11.6k3 gold badges25 silver badges55 bronze badges

2 Comments

norok2 Over a year ago

Note that np.searchsorted(c, t, side='right') might occasionally give a different result compared to OP code. np.searchsorted(c, t) + 1 (so side='left') should be used instead.

hilberts_drinking_problem Over a year ago

Good point, I noticed that. I originally convinced myself that there was an edge case against side='left' as well, but it seems that I was imagining things.

yatu · Accepted Answer · 2020-03-12 19:59:38Z

1

Here's a NumPy based approach:

(np.cumsum(np.sort(a)[::-1]) < t).sum() + 1

For instance:

a = np.array([1,2,8,5,13,9])

(np.cumsum(np.sort(a)[::-1]) < 25).sum() + 1
# 3

Where:

np.sort(a)[::-1]
# array([13,  9,  8,  5,  2,  1])

And:

np.cumsum(np.sort(a)[::-1])
# array([13, 22, 30, 35, 37, 38])

answered Mar 12, 2020 at 19:59

yatu

88.6k12 gold badges93 silver badges148 bronze badges

Comments

doggie_breath · Accepted Answer · 2020-03-12 20:18:26Z

1

Depending on the length of the arrays, maybe sort, and then you could use a cumulative sum? It would at least be faster than that while loop.

>>> a = np.array(range(10)[::-1])
>>> a
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
>>> b = np.cumsum(a)
>>> b
array([ 9, 17, 24, 30, 35, 39, 42, 44, 45, 45])

Then just use argmax, say you wanted the index where it passed 40:

>>> np.argmax(b > 40)
6

edited Mar 12, 2020 at 20:18

answered Mar 12, 2020 at 19:56

doggie_breath

8021 gold badge8 silver badges21 bronze badges

2 Comments

duhaime Over a year ago

yes, you need to flip that sorted array before the cumsum!

doggie_breath Over a year ago

Oops, right. Thanks! Edited to flip the array. And upped the value to make it actually do something.

Collectives™ on Stack Overflow

Efficient cummulative sum in Python

4 Answers 4

5 Comments

2 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

2 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related