2

Perhaps looking at this quick example will help you to understand what I try to do:

import pandas as pd
df = pd.DataFrame({"A": [10,20,30,50,70,40], "B": [20,30,10,15,20,30]})


def _custom_function(X):    
    # whatever... just for the purpose of the example
    # but I need X to be the actual df and not a series

    Y = sum((X['A'] / X['B']) + (0.2 * X['B']))   
    return Y


df['C'] = df.rolling(2).apply(_custom_function, axis=0)

When the custom function is called, X is Series type and only the first columns of the df. Is it possible to pass a df trought the apply function ?

Edit: it is possible to use rolling().apply():

import pandas as pd
df = pd.DataFrame({"A": [10,20,30,50,70,40], "B": [20,30,10,15,20,30]})


def _custom_function(X):    
    # whatever... just for the purpose of the example
    Y = sum(0.2 * X)    
    return Y


df['C'] = df['A'].rolling(2).apply(_custom_function)

Second edit: list comprehension with rolling does not behave as expected

for x in df.rolling(3):
    print(x)

As you can see in the example below both approaches don't give the same output:

import pandas as pd
df = pd.DataFrame({"A": [10,20,30,50,70,40], "B": [20,30,10,15,20,30]})
df['C'] = 0.2


def _custom_function_df(X):    
    # whatever... just for the purpose of the example
    # but I need X to be the actual df and not a series
    Y = sum(X['C'] * X['B'])
    return Y

def _custom_function_series(X):    
    # whatever... just for the purpose of the example
    # but I need X to be the actual df and not a series
    Y = sum(0.2 * X)
    return Y


df['result'] = df['B'].rolling(3).apply(_custom_function_series)

df['result2'] = [x.pipe(_custom_function_df) for x in df.rolling(3, min_periods=3)]

The list comprehension with rolling output the first lines (no expected NaN), but starts the correct rolling ONLY after len(x) = 3, the rolling window.

enter image description here

Thanks in advance !

1 Answer 1

2

Pass DataFrame to function:

df['C'] = _custom_function(df)

Or use DataFrame.pipe:

df['C'] = df.pipe(_custom_function)

print (df)
    A   B         C
0  10  20  4.500000
1  20  30  6.666667
2  30  10  5.000000
3  50  15  6.333333
4  70  20  7.500000
5  40  30  7.333333

EDIT: Rolling.apply working by each column separately, so cannot used here.

Possible solution:

df['C'] = [x.pipe(_custom_function) for x in df.rolling(2)]
print (df)
    A   B          C
0  10  20   4.500000
1  20  30  11.166667
2  30  10  11.666667
3  50  15  11.333333
4  70  20  13.833333
5  40  30  14.833333

EDIT: If seems bug, default rolling working like min_periods=1.

Here is solution (hack):

df['result'] = df['B'].rolling(3).apply(_custom_function_series)

df['result2']=[x.pipe(_custom_function_df) if len(x)==3 else np.nan for x in df.rolling(3)]

print (df)
    A   B    C  result  result2
0  10  20  0.2     NaN      NaN
1  20  30  0.2     NaN      NaN
2  30  10  0.2    12.0     12.0
3  50  15  0.2    11.0     11.0
4  70  20  0.2     9.0      9.0
5  40  30  0.2    13.0     13.0
Sign up to request clarification or add additional context in comments.

14 Comments

Thank you very much ! I think pipe() is what I was looking for because my end goal is to do something like: df.rolling(n).pipe(_custom_function)
@plonfat - some problem with solution?
Hi @jezrael, yes unfortunately. 'Rolling' object has no attribute 'pipe'. However, rolling does have the attribute 'apply' but this goes back to my initial problem.
@plonfat - in rolling apply it is imposibble.
just edited the question that shows my problem.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.