Pandas: rolling count if within a loop

Question

In my data frame I want to create a column '5D_Peak' as a rolling max, and then another column with rolling count of historical data that's close to the peak. I wonder if there is an easier way to simply or ideally vectorise the calculation.

This is my codes in a plain but complicated way:

import numpy as np
import pandas as pd

df = pd.DataFrame([[1,2,4],[4,5,2],[3,5,8],[1,8,6],[5,2,8],[1,4,10],[3,5,9],[1,4,7],[1,4,6]], columns=list('ABC'))

df['5D_Peak']=df['C'].rolling(window=5,center=False).max()

for i in range(5,len(df.A)):
    val=0
    for j in range(i-5,i):
        if df.loc[j,'C']>df.loc[i,'5D_Peak']-2 and df.loc[j,'C']<df.loc[i,'5D_Peak']+2:
            val+=1
    df.loc[i,'5D_Close_to_Peak_Count']=val

This is the output I want:

   A  B   C  5D_Peak  5D_Close_to_Peak_Count
0  1  2   4      NaN                     NaN
1  4  5   2      NaN                     NaN
2  3  5   8      NaN                     NaN
3  1  8   6      NaN                     NaN
4  5  2   8      8.0                     NaN
5  1  4  10     10.0                     0.0
6  3  5   9     10.0                     1.0
7  1  4   7     10.0                     2.0
8  1  4   6     10.0                     2.0

user2738815 · Accepted Answer · 2017-03-18 21:59:48Z

1

I believe this is what you want. You can set the two values below:

'''the window within which to search "close-to_peak" values'''
lkp_rng = 5 

'''how close is close?'''
closeness_measure = 2

'''function to count the number of "close-to_peak" values in the lkp_rng'''
fc = lambda x: np.count_nonzero(np.where(x >= x.max()- closeness_measure))

'''apply fc to the coulmn you choose'''
df['5D_Close_to_Peak_Count'] = df['C'].rolling(window=lkp_range,center=False).apply(fc)
df.head(10)
        A   B   C   5D_Peak     5D_Close_to_Peak_Count
    0   1   2   4   NaN            NaN
    1   4   5   2   NaN            NaN
    2   3   5   8   NaN            NaN
    3   1   8   6   NaN            NaN
    4   5   2   8   8.0            3.0
    5   1   4   10  10.0           3.0
    6   3   5   9   10.0           3.0
    7   1   4   7   10.0           3.0
    8   1   4   6   10.0           2.0

I am guessing what you mean by "historical data".

edited Mar 18, 2017 at 21:59

answered Mar 18, 2017 at 21:22

user2738815

1,2883 gold badges14 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

thunderlion Over a year ago

Thank you. This solves my problem as well. But I guess the vectorisation method suggested by JohnE is faster?

user2738815 Over a year ago

If you are using ipython notebook, just insert %%prun at the top of the cell in which you are running the code. It will give a long list, with a one sentence summary at the top. I am getting "826 function calls (820 primitive calls) in 0.003 seconds" for mine. If you insert %%timeit instead and run the cell, it gives "1000 loops, best of 3: 745 µs per loop". You can check the other code as well.

user2738815 Over a year ago

@JohnE Sure, I understand. I am not a programmer--just trying to help the OP. But the facts have to be straight if he is going to be helped. When I run your code with timeit, this is what I get: "100 loops, best of 3: 4.57 ms per loop" (I don't know why mine is running with 1000 loops and yours only with 100). And with prun, this is the output: "7310 function calls (7291 primitive calls) in 0.012 seconds". So roughly, it is 4-6 times slower. But I would rather you provided your figures, because I dislike commenting on other people's efforts unless really necessary. Thanks.

user2738815 Over a year ago

@JohnE I am done here. The OP does not have to like what I offered :)

Collectives™ on Stack Overflow

Pandas: rolling count if within a loop

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related