11

I would like to use the pandas.rolling_apply function to apply my own custom function on a rolling window basis.

but my function requires two arguments, and also has two outputs. Is this possible?

Below is a minimum reproducible example...

import pandas as pd
import numpy as np
import random
tmp  = pd.DataFrame(np.random.randn(2000,2)/10000, 
                    index=pd.date_range('2001-01-01',periods=2000),
                    columns=['A','B'])

def gm(df,p):
    v =(((df+1).cumprod())-1)*p
    return v.iloc[-1]

# an example output when subsetting for just 2001
gm(tmp['2001'],5)


# the aim is to do it on a rolling basis over a 50 day window
# whilst also getting both outputs and also allows me to add in the parameter p=5
# or any other number I want p to be... 
pd.rolling_apply(tmp,50,gm)

which leads to an error...since gm takes two arguments...

any help would be greatly appreciated...

EDIT

Following Jeff's comment I have progressed, but am still struggling with two or more column outputs, so if instead i make a new function (below) which just returns two random numbers (unconnected to the previous calculation) instead rather than the last rows of v, I get an error of TypeError: only length-1 arrays can be converted to Python scalars. This function works if

def gm2(df,p):
    df = pd.DataFrame(df)
    v =(((df+1).cumprod())-1)*p
    return np.random.rand(2)

pd.rolling_apply(tmp,50,lambda x: gm2(x,5)).tail(20)

This function works if 2 is changed to 1...

1 Answer 1

15

rolling_apply passes numpy arrays to the applied function (at-the-moment), by 0.14 it should pass a frame. The issue is here

So redefine your function to work on a numpy array. (You can of course construct a DataFrame inside here, but your index/column names won't be the same).

In [9]: def gm(df,p):
   ...:     v = ((np.cumprod(df+1))-1)*p
   ...:     return v[-1]
   ...: 

If you wanted to use more of pandas functions in your custom function, do this (note that the indicies of the calling frame are not passed ATM).

def gm(arr,p):
    df = DataFrame(arr)
    v =(((df+1).cumprod())-1)*p
    return v.iloc[-1]

Pass it thru a lambda

In [11]: pd.rolling_apply(tmp,50,lambda x: gm(x,5)).tail(20)
Out[11]: 
                   A         B
2006-06-04  0.004207 -0.002112
2006-06-05  0.003880 -0.001598
2006-06-06  0.003809 -0.002228
2006-06-07  0.002840 -0.003938
2006-06-08  0.002855 -0.004921
2006-06-09  0.002450 -0.004614
2006-06-10  0.001809 -0.004409
2006-06-11  0.001445 -0.005959
2006-06-12  0.001297 -0.006831
2006-06-13  0.000869 -0.007878
2006-06-14  0.000359 -0.008102
2006-06-15 -0.000885 -0.007996
2006-06-16 -0.001838 -0.008230
2006-06-17 -0.003036 -0.008658
2006-06-18 -0.002280 -0.008552
2006-06-19 -0.001398 -0.007831
2006-06-20 -0.000648 -0.007828
2006-06-21 -0.000799 -0.007616
2006-06-22 -0.001096 -0.006740
2006-06-23 -0.001160 -0.006004

[20 rows x 2 columns]
Sign up to request clarification or add additional context in comments.

10 Comments

how do you "redefine your function to work on a numpy array."?
you can only use numpy functions (and not pandas functions); or you can do DataFrame(df) to make it a frame
Does this mean that in the custom function i can only run numpy functions and no pandas functions?
as I said, you could wrap the passed numpy array in a DataFrame if you want, then use pandas function, BUT, it will have only basic indexes; I'll change the answer to illustrate
that's still an open issue pull requests to fix are welcome
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.