numpy/pandas vectorize custom for loop

Question

I created some example code that mimic what code I got:

import numpy as np 

arr = np.random.random(100)
arr2 = np.linspace(0, 1, 20)
arr3 = np.zeros(20) # this is the array i want to store the result in
for index, num in enumerate(list(arr2)):
    arr3[index] = np.mean(arr[np.abs(num - arr) < 0.2])

>>> arr3
array([0.10970893, 0.1132479 , 0.14687451, 0.17257954, 0.19401919,
       0.23852137, 0.29151448, 0.35715096, 0.43273118, 0.45800796,
       0.52940421, 0.60345354, 0.63969432, 0.67656363, 0.72921913,
       0.78330793, 0.82693675, 0.83717402, 0.86651827, 0.89782569])

My problem is this code runs on much larger data. I want to know if it is possible to make some combination of numpy or pandas to do it in a vectorized way without using explicit loops. I tried many approaches but don't have something in my mind.

Look at arr2[:,None]-arr. That should be a (20,100) array. np.mean takes an axis parameter letting you reduce that to a (20,) result. — hpaulj
– hpaulj, Commented Dec 29, 2020 at 19:16
I'd recommend np.empty_like rather than np.zeros for arr3 — Mad Physicist
– Mad Physicist, Commented Dec 30, 2020 at 20:17
No need to turn arr2 into a list to enumerate it. It will iterate just fine as an array — Mad Physicist
– Mad Physicist, Commented Dec 30, 2020 at 20:19
@hpaulj. The difference you show builds a mask. How do you propose applying it to compute the mean? Masked arrays? — Mad Physicist
– Mad Physicist, Commented Dec 30, 2020 at 20:22

Mad Physicist · Accepted Answer · 2020-12-30 20:56:09Z

4

I would recommend a completely different approach if you are dealing with large arrays. Right now, you are searching through the entirety of arr for every element in arr2. This is clearly overkill. Instead, you can operate on a sorted arr and simply sum between the insertion points obtained from np.searchsorted.

Sort arr in place if you can:

arr.sort()

You know the width of your interval, so find the bounding values. I'm making the array shaped (20, 2) to match the bounds more easily:

bounds = arr2.reshape(-1, 1) + [-0.2, 0.2]

Now find the insertion indices:

ind = np.searchsorted(arr, bounds)

ind is the same shape as bounds. ind[i, :] are the start (inclusive) and end (exclusive) indices into arr that correspond to the ith element of arr2. In other words, for any given i, arr3[i] in the original question is arr[ind[i, 0]:ind[i, 1]].mean(). You can use this directly for a non-vectorized solution:

result = np.array([arr[slice(*i)].mean() for i in ind])

There are a couple of ways to vectorize the solution. In either case, you'll want the number of elements in each run:

n = np.diff(ind, axis=1).ravel()

A quick and dirty solution that is prone to rounding errors uses np.cumsum and fancy indexing using ind:

cumulative = np.r_[0, np.cumsum(arr)]
sums = np.diff(cumulative[ind], axis=1).ravel()
result = sums / n

A more robust solution would extract only the sums that you actually need using np.add.reduceat:

arr = np.r_[arr, 0]  # compensate for index past the end
sums = np.add.reduceat(arr, ind.ravel())[::2]
result = sums / n

You can compare the results of both approaches to arr3 computed in the question to verify that the second approach is noticeably more accurate, even with your toy example.

Timing

def original(arr, arr2, d):
    arr3 = np.empty_like(arr2)
    for index, num in enumerate(arr2):
        arr3[index] = np.mean(arr[np.abs(num - arr) < d])
    return arr3

def ananda(arr, arr2, d):
    arr_tile = np.tile(arr, (len(arr2), 1))
    arr_tile[np.abs(arr - arr2[:, None]) >= d] = np.nan
    return np.nanmean(arr_tile, axis=1)

def mad_0(arr, arr2, d):
    arr.sort()
    ind = np.searchsorted(arr, arr2.reshape(-1, 1) + [-d, d])
    return np.array([arr[slice(*i)].mean() for i in ind])

def mad_1(arr, arr2, d):
    arr.sort()
    ind = np.searchsorted(arr, arr2.reshape(-1, 1) + [-d, d])
    n = np.diff(ind, axis=1).ravel()
    sums = np.diff(np.r_[0, np.cumsum(arr)][ind], axis=1).ravel()
    return sums / n

def mad_2(arr, arr2, d):
    arr.sort()
    ind = np.searchsorted(arr, arr2.reshape(-1, 1) + [-d, d])
    n = np.diff(ind, axis=1).ravel()
    arr = np.r_[arr, 0]
    sums = np.add.reduceat(arr, ind.ravel())[::2]
    return sums / n

Inputs (reset for each run):

np.random.seed(42)
arr = np.random.rand(100)
arr2 = np.linspace(0, 1, 1000)

Results:

original: 25.5 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
  ananda: 2.66 ms ± 35.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   mad_0: 14.5 ms ± 48.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   mad_1:  211 µs ± 1.41 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
   mad_2:  242 µs ± 1.93 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

For 100 elements with 1k bins, the original approach is ~10x slower than using np.tile. Using a list comprehension is only 2x better than the original approach. While the np.cumsum approach appears to be a little faster than np.add.reduce, it is potentially not as numerically stable.

Another advantage of using the methods I suggest is that you can change arr2 arbitrarily, while arr only needs to be sorted once.

edited Dec 30, 2020 at 20:56

answered Dec 30, 2020 at 8:59

Mad Physicist

116k29 gold badges202 silver badges292 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

adir abargil Over a year ago

there is an error, tough it seems right, but i cant figure out the error because i stiull didnt fully understand the whole process... error:in line sums = np.diff(cumulative[ind], axis=1) error: IndexError: index 100 is out of bounds for axis 0 with size 100

adir abargil Over a year ago

im testing it and trying to figure out if you have an idea i will be waiting here :) thank you very much!

Mad Physicist Over a year ago

@adirabargil. I'll work out the bugs when I get to a desktop. In the meantime, to help you visualize, try this: compare arr[ind[i, 0]:ind[i, 1]].mean() in my version to arr3[i] in your implementation. The error happens because of weird indexing rules for reduceat and an offset I forgot about in cumsum.

Mad Physicist Over a year ago

@adirabargil. I've fixed the answer and added a third option.

adir abargil Over a year ago

Thank you very much, it’s evening here sonill check it out tommorow!

|

Alain T. · Accepted Answer · 2020-12-30 18:26:40Z

Unless arr2 is larger than arr, you will get very little performance improvement (if any) by vectorizing that loop. All vectorized approaches will need to boardcast arr to arr2 giving an NxM structure with a size that will offset the benefits.

I measured various approaches:

import numpy as np

def original(arr,arr2):
    arr3 = np.zeros(arr2.size) # this is the array i want to store the result in
    for index, num in enumerate(list(arr2)):
        arr3[index] = np.mean(arr[np.abs(num - arr) < 0.2])
    return arr3
        
def vectorized1(arr,arr2):
    delta  = (np.abs(arr2[:,None]-arr)<0.2).astype(np.int)
    return np.sum(arr*delta,axis=1)/np.sum(delta,axis=1)

def vectorized2(arr,arr2):
    return np.fromiter((np.mean(arr[np.abs(num-arr)<0.2]) for num in arr2),np.float64)

def vectorized3(arr,arr2):
    return np.apply_along_axis(lambda num:np.mean(arr[np.abs(num-arr)<0.2]),1,arr2[:,None])

def vectorized4(arr,arr2):
    select  = np.array([np.nan,1])[(np.abs(arr2[:,None]-arr)<0.2).astype(np.int)]
    return np.nanmean(select*arr,axis=1)

def vectorized5(arr,arr2):
    arr_tile = np.tile(arr, (len(arr2), 1))
    arr_tile[np.abs(arr - arr2[:, None]) >= 0.2] = np.NaN
    return np.nanmean(arr_tile, axis=1)

Note that vectorized2 and vectorized3 are not actually vectorizations of the calculations. They merely hide the loop that is being performed. Vectorized5 is Ananda's solution

Results when arr is larger than arr2:

from timeit import timeit
count = 1

arr  = np.random.random(10000)
arr2 = np.linspace(0, 1, 2000)

t = timeit(lambda:original(arr,arr2),number=count)
print("original    time:",t)

t = timeit(lambda:vectorized1(arr,arr2),number=count)
print("vectorized1 time:",t)

t = timeit(lambda:vectorized2(arr,arr2),number=count)
print("vectorized2 time:",t)

t = timeit(lambda:vectorized3(arr,arr2),number=count)
print("vectorized3 time:",t)

t = timeit(lambda:vectorized4(arr,arr2),number=count)
print("vectorized4 time:",t)

t = timeit(lambda:vectorized5(arr,arr2),number=count)
print("vectorized5 time:",t)

original    time: 0.14478049999999998
vectorized1 time: 0.3868172580000001
vectorized2 time: 0.14587923599999986
vectorized3 time: 0.15062318699999988
vectorized4 time: 0.6438709420000002
vectorized5 time: 0.543624409

Results when arr2 is larger than arr:

arr  = np.random.random(100)
arr2 = np.linspace(0, 1, 200000)

print()
print(f"arr {arr.size}, arr2 {arr2.size}")
t = timeit(lambda:original(arr,arr2),number=count)
print("original    time:",t)

t = timeit(lambda:vectorized1(arr,arr2),number=count)
print("vectorized1 time:",t)

t = timeit(lambda:vectorized2(arr,arr2),number=count)
print("vectorized2 time:",t)

t = timeit(lambda:vectorized3(arr,arr2),number=count)
print("vectorized3 time:",t)

t = timeit(lambda:vectorized4(arr,arr2),number=count)
print("vectorized4 time:",t)

t = timeit(lambda:vectorized5(arr,arr2),number=count)
print("vectorized5 time:",t)

original    time: 1.7699030359999997
vectorized1 time: 0.38871579499999953
vectorized2 time: 1.782099327
vectorized3 time: 2.443001885
vectorized4 time: 0.5951444290000012
vectorized5 time: 0.4536258110000002

note that this apparent improvement only holds true up to a point because, when memory is exceeded, the vectorized times will explode again even when arr2 is larger than arr.

In fact on that measure, Arr2 is larger then arr... arr2 for about 5-10 times...
vectorized2 and vectorized3 are not vectorized at all. Increasing the number of dimensions is vectorizing, but is not the right approach, I think. Try the three sorting approaches I show.

Ananda · Accepted Answer · 2020-12-30 08:14:29Z

1

One way of solving this would be by setting all the numbers that doesn't fit your condition to nan and taking the mean of the rest.

import numpy as np 

arr = np.random.random((100))
arr2 = np.linspace(0,1,20)
arr3 = np.zeros(20) # this is the array i want to store the result in...

for index,num in enumerate(list(arr2)): 
    arr3[index] = np.mean(arr[np.abs(num - arr) < 0.2])
    
arr_tile = np.tile(arr, (len(arr2), 1))
arr_tile[np.abs(arr - arr2[:, None]) >= 0.2] = np.NaN
res = np.nanmean(arr_tile, axis=1)

np.allclose(res, arr3)

This gives True

answered Dec 30, 2020 at 8:14

Ananda

3,3006 gold badges28 silver badges49 bronze badges

7 Comments

adir abargil Over a year ago

Oh i see i was confused because you left the for loop.. let me try this when o get to my office

Ananda Over a year ago

Yes sorry, I just left it so as to show that it does give the same solution as yours.

adir abargil Over a year ago

your solution i seems really smart and works well, but the issue is the memory size, it can overflow easily for bigger datasets... im still trying to figure out

adir abargil Over a year ago

thank you very much your solution eventually came in handy..!

Mad Physicist Over a year ago

This seems like the slowest approach: for large arrays, its a lot slower than the approach in the question.

|

Collectives™ on Stack Overflow

numpy/pandas vectorize custom for loop

3 Answers 3

12 Comments

2 Comments

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

12 Comments

2 Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related