Python pandas rolling_apply two column input into function

Question

Following on from this question Python custom function using rolling_apply for pandas, about using rolling_apply. Although I have progressed with my function, I am struggling to deal with a function that requires two or more columns as inputs:

Creating the same setup as before

import pandas as pd
import numpy as np
import random

tmp  = pd.DataFrame(np.random.randn(2000,2)/10000, 
                    index=pd.date_range('2001-01-01',periods=2000),
                    columns=['A','B'])

But changing the function slightly to take two columns.

def gm(df,p):
    df = pd.DataFrame(df)
    v =((((df['A']+df['B'])+1).cumprod())-1)*p
    return v.iloc[-1]

It produces the following error:

pd.rolling_apply(tmp,50,lambda x: gm(x,5))

  KeyError: u'no item named A'

I think it is because the input to the lambda function is an ndarray of length 50 and only of the first column, and doesn't take two columns as the input. Is there a way to get both columns as inputs and use it in a rolling_apply function.

Again any help would be greatly appreciated...

Possible duplicate of stackoverflow.com/questions/37486502/…. See my answer there. — gosuto
– gosuto, Commented Aug 27, 2018 at 5:32
Possible duplicate of Pandas Dataframe rolling with two columns and two rows — gosuto
– gosuto, Commented Jan 30, 2019 at 5:58

rrcal · Accepted Answer · 2020-12-07 17:34:07Z

15

Not sure if still relevant here, with the new rolling classes on pandas, whenever we pass raw=False to apply, we are actually passing the series to the wraper, which means we have access to the index of each observation, and can use that to further handle multiple columns.

From the docs:

raw : bool, default None

False : passes each row or column as a Series to the function.

True or None : the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance.

In this scenario, we can do the following:

### create a func for multiple columns
def cust_func(s):

    val_for_col2 = df.loc[s.index, col2] #.values
    val_for_col3 = df.loc[s.index, col3] #.values
    val_for_col4 = df.loc[s.index, col4] #.values
    
    ## apply over multiple column values
    return np.max(s) *np.min(val_for_col2)*np.max(val_for_col3)*np.mean(val_for_col4)
    

### Apply to the dataframe
df.rolling('10s')['col1'].apply(cust_func, raw=False)

Note that here we can still use all functionalities from pandas rolling class, which is particularly useful when dealing with time-related windows.

The fact that we are passing one column and using the entire dataframe feels like a hack, but it works in practice.

edited Dec 7, 2020 at 17:34

answered Aug 22, 2019 at 3:53

rrcal

3,7906 gold badges27 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

GivenX Over a year ago

works for me, thanks. just note its np.max(val_for_col3), not np.max(val_for_cal3)

rrcal Over a year ago

Thanks @GivenX, just fixed it.

ajsp Over a year ago

Saves an awful lot of hassle.

High GPA Over a year ago

Very neat answer but what does 10s at the last line of the code means?

rrcal Over a year ago

@High GPA it means 10 seconds

lowtech · Accepted Answer · 2014-01-10 21:59:28Z

9

Looks like rolling_apply will try to convert input of user func into ndarray (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.stats.moments.rolling_apply.html?highlight=rolling_apply#pandas.stats.moments.rolling_apply).

Workaround based on using aux column ii which is used to select window inside of manipulating function gm:

import pandas as pd
import numpy as np
import random

tmp = pd.DataFrame(np.random.randn(2000,2)/10000, columns=['A','B'])
tmp['date'] = pd.date_range('2001-01-01',periods=2000)
tmp['ii'] = range(len(tmp))            

def gm(ii, df, p):
    x_df = df.iloc[map(int, ii)]
    #print x_df
    v =((((x_df['A']+x_df['B'])+1).cumprod())-1)*p
    #print v
    return v.iloc[-1]

#print tmp.head()
res = pd.rolling_apply(tmp.ii, 50, lambda x: gm(x, tmp, 5))
print res

edited Jan 10, 2014 at 21:59

answered Jan 10, 2014 at 16:15

lowtech

2,6123 gold badges25 silver badges34 bronze badges

6 Comments

8one6 Over a year ago

This is slick. I like it.

8one6 Over a year ago

In this spirit, how would you pull off a similar hack if the index were a multi-index? Or any non-numerical index, for that matter? Always necessary to first convert the index to floats?

lowtech Over a year ago

i modified my answer so it no longer uses indexes. gm still getting array of floats so i have to map them to ints to be used with iloc

adr Over a year ago

The idea works, but after trying this approach it seems more complicated than it needs to be. I now just use a for loop to roll through the dataframe and can both evaluate and calculate multiple columns.

lowtech Over a year ago

with for loops you may end up with code which is MUCH slower - sometimes it is a big problem.

|

Community · Accepted Answer · 2017-05-23 12:03:03Z

Here's another version of this question: Using rolling_apply on a DataFrame object. Use this if your function returns a Series.

Since yours returns a scalar, do this.

In [71]: df  = pd.DataFrame(np.random.randn(2000,2)/10000, 
                    index=pd.date_range('2001-01-01',periods=2000),
                    columns=['A','B'])

Redefine your function to return a tuple with the index you want to use and scalar value that is computed. Note that this is slightly different as we are returning the first index here (and not the normally returned last, youy could do either).

In [72]: def gm(df,p):
              v =((((df['A']+df['B'])+1).cumprod())-1)*p
              return (df.index[0],v.iloc[-1])


In [73]: Series(dict([ gm(df.iloc[i:min((i+1)+50,len(df)-1)],5) for i in xrange(len(df)-50) ]))

Out[73]: 
2001-01-01    0.000218
2001-01-02   -0.001048
2001-01-03   -0.002128
2001-01-04   -0.003590
2001-01-05   -0.004636
2001-01-06   -0.005377
2001-01-07   -0.004151
2001-01-08   -0.005155
2001-01-09   -0.004019
2001-01-10   -0.004912
2001-01-11   -0.005447
2001-01-12   -0.005258
2001-01-13   -0.004437
2001-01-14   -0.004207
2001-01-15   -0.004073
...
2006-04-20   -0.006612
2006-04-21   -0.006299
2006-04-22   -0.006320
2006-04-23   -0.005690
2006-04-24   -0.004316
2006-04-25   -0.003821
2006-04-26   -0.005102
2006-04-27   -0.004760
2006-04-28   -0.003832
2006-04-29   -0.004123
2006-04-30   -0.004241
2006-05-01   -0.004684
2006-05-02   -0.002993
2006-05-03   -0.003938
2006-05-04   -0.003528
Length: 1950

alko · Accepted Answer · 2014-01-10 10:42:11Z

0

All rolling_* functions works on 1d array. I'm sure one can invent some workarounds for passing 2d arrays, but in your case, you can simply precompute row-wise values for rolling evaluation:

>>> def gm(x,p):
...     return ((np.cumprod(x) - 1)*p)[-1]
...
>>> pd.rolling_apply(tmp['A']+tmp['B']+1, 50, lambda x: gm(x,5))
2001-01-01   NaN
2001-01-02   NaN
2001-01-03   NaN
2001-01-04   NaN
2001-01-05   NaN
2001-01-06   NaN
2001-01-07   NaN
2001-01-08   NaN
2001-01-09   NaN
2001-01-10   NaN
2001-01-11   NaN
2001-01-12   NaN
2001-01-13   NaN
2001-01-14   NaN
2001-01-15   NaN
...
2006-06-09   -0.000062
2006-06-10   -0.000128
2006-06-11    0.000185
2006-06-12   -0.000113
2006-06-13   -0.000962
2006-06-14   -0.001248
2006-06-15   -0.001962
2006-06-16   -0.003820
2006-06-17   -0.003412
2006-06-18   -0.002971
2006-06-19   -0.003882
2006-06-20   -0.003546
2006-06-21   -0.002226
2006-06-22   -0.002058
2006-06-23   -0.000553
Freq: D, Length: 2000

answered Jan 10, 2014 at 10:42

alko

48.7k12 gold badges99 silver badges105 bronze badges

1 Comment

h.l.m Over a year ago

Thanks for that, but the example function of gm was merely a mock example...so I am still keen to figure out what the work around is to get two or more columns...

Collectives™ on Stack Overflow

Python pandas rolling_apply two column input into function

4 Answers 4

5 Comments

6 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

6 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related