1

In my data frame I want to create a column '5D_Peak' as a rolling max, and then another column with rolling count of historical data that's close to the peak. I wonder if there is an easier way to simply or ideally vectorise the calculation.

This is my codes in a plain but complicated way:

import numpy as np
import pandas as pd

df = pd.DataFrame([[1,2,4],[4,5,2],[3,5,8],[1,8,6],[5,2,8],[1,4,10],[3,5,9],[1,4,7],[1,4,6]], columns=list('ABC'))

df['5D_Peak']=df['C'].rolling(window=5,center=False).max()

for i in range(5,len(df.A)):
    val=0
    for j in range(i-5,i):
        if df.loc[j,'C']>df.loc[i,'5D_Peak']-2 and df.loc[j,'C']<df.loc[i,'5D_Peak']+2:
            val+=1
    df.loc[i,'5D_Close_to_Peak_Count']=val

This is the output I want:

   A  B   C  5D_Peak  5D_Close_to_Peak_Count
0  1  2   4      NaN                     NaN
1  4  5   2      NaN                     NaN
2  3  5   8      NaN                     NaN
3  1  8   6      NaN                     NaN
4  5  2   8      8.0                     NaN
5  1  4  10     10.0                     0.0
6  3  5   9     10.0                     1.0
7  1  4   7     10.0                     2.0
8  1  4   6     10.0                     2.0
0

1 Answer 1

1

I believe this is what you want. You can set the two values below:

'''the window within which to search "close-to_peak" values'''
lkp_rng = 5 

'''how close is close?'''
closeness_measure = 2

'''function to count the number of "close-to_peak" values in the lkp_rng'''
fc = lambda x: np.count_nonzero(np.where(x >= x.max()- closeness_measure))

'''apply fc to the coulmn you choose'''
df['5D_Close_to_Peak_Count'] = df['C'].rolling(window=lkp_range,center=False).apply(fc)
df.head(10)
        A   B   C   5D_Peak     5D_Close_to_Peak_Count
    0   1   2   4   NaN            NaN
    1   4   5   2   NaN            NaN
    2   3   5   8   NaN            NaN
    3   1   8   6   NaN            NaN
    4   5   2   8   8.0            3.0
    5   1   4   10  10.0           3.0
    6   3   5   9   10.0           3.0
    7   1   4   7   10.0           3.0
    8   1   4   6   10.0           2.0

I am guessing what you mean by "historical data".

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you. This solves my problem as well. But I guess the vectorisation method suggested by JohnE is faster?
If you are using ipython notebook, just insert %%prun at the top of the cell in which you are running the code. It will give a long list, with a one sentence summary at the top. I am getting "826 function calls (820 primitive calls) in 0.003 seconds" for mine. If you insert %%timeit instead and run the cell, it gives "1000 loops, best of 3: 745 µs per loop". You can check the other code as well.
@JohnE Sure, I understand. I am not a programmer--just trying to help the OP. But the facts have to be straight if he is going to be helped. When I run your code with timeit, this is what I get: "100 loops, best of 3: 4.57 ms per loop" (I don't know why mine is running with 1000 loops and yours only with 100). And with prun, this is the output: "7310 function calls (7291 primitive calls) in 0.012 seconds". So roughly, it is 4-6 times slower. But I would rather you provided your figures, because I dislike commenting on other people's efforts unless really necessary. Thanks.
@JohnE I am done here. The OP does not have to like what I offered :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.