Python pandas: apply a function to dataframe.rolling()

Question

I have this dataframe:

In[1]df = pd.DataFrame([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],[16,17,18,19,20],[21,22,23,24,25]])
In[2]df
Out[2]: 
    0   1   2   3   4
0   1   2   3   4   5
1   6   7   8   9  10
2  11  12  13  14  15
3  16  17  18  19  20
4  21  22  23  24  25

I need to achieve this:

for every rows in my dataframe,
if 2 or more values within any 3 consecutive cells is greater than 10,
then the last of that 3 cells should be marked as True.

The resulting dataframe df1 should be same size with True of False in it based on the above stated criteria:

In[3]df1
Out[3]: 
    0   1      2      3      4
0 NaN NaN  False  False  False
1 NaN NaN  False  False  False
2 NaN NaN   True   True   True
3 NaN NaN   True   True   True
4 NaN NaN   True   True   True

df1.iloc[0,1] is NaN bacause in that cell, only two numbers were given but needed atleast 3 numbers to do the test.
df1.iloc[1,3] is False since none in [7,8,9] is greater than 10
df1.iloc[3,4] is True since 2 or more in [18,19,20] is greater than 10

I figured dataframe.rolling.apply() with a function might be the solution, but how exactly?

@penguin2048 I edited the post, my question is how to achieve 1 2 3 4 in the post. — Yi Fang
– Yi Fang, Commented Apr 15, 2018 at 5:15
Welcome to StackOverflow. Please take the time to read this post on how to provide a great pandas example as well as how to provide a minimal, complete, and verifiable example and revise your question accordingly. These tips on how to ask a good question may also be useful. — jezrael
– jezrael, Commented Apr 15, 2018 at 5:43

penguin2048 · Accepted Answer · 2018-04-15 12:05:55Z

6

You are right that using rolling() is the way to go. However, you must keep in mind since rolling() replaces the value at end of the window with the new value, so you can not just mark the window with True you will also get False whenever the condition is not applicable

Here is the code that uses your sample dataframe and performs the desired transformation:

df = pd.DataFrame([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],[16,17,18,19,20],[21,22,23,24,25]])

now, defining a function that takes a window as an argument and returns whether the condition is satisfied

def fun(x):
    num = 0
    for i in x:
        num += 1 if i > 10 else 0
    return 1 if num >= 2 else -1

I have hardcoded the threshold as 10. So if in any window the numbers of values greater than 10 are greater than or equal to 2 than the last value is replaced by 1 (denoting True), else it is replaced by -1(denoting False).

If you want to keep threshold parameters as variables, then have a look at this answer to pass them as arguments.

Now applying the function on rolling window, using window size as 3, axis 1 and additionally if you don't want NaN then you can also set min_periods to 1 in the arguments.

df.rolling(3, axis=1).apply(fun)

produces the output as

  0   1    2    3    4
0 NaN NaN -1.0 -1.0 -1.0
1 NaN NaN -1.0 -1.0 -1.0
2 NaN NaN  1.0  1.0  1.0
3 NaN NaN  1.0  1.0  1.0
4 NaN NaN  1.0  1.0  1.0

edited Apr 15, 2018 at 12:05

answered Apr 15, 2018 at 6:13

penguin2048

1,34313 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Yi Fang Over a year ago

Thank you for your explaination, what if I need to the threadhold(0 in your if i > 0) and greater than 1(1 in your if num > 1) both as argument of the function, how do I rewrite 'df.rolling(3, axis=1, min_periods=1).apply(fun)'? Function in .apply takes more than self as argument?

penguin2048 Over a year ago

stackoverflow.com/questions/12182744/… check this

piRSquared · Accepted Answer · 2018-04-15 08:19:07Z

3

Use sum on a boolean dataframe.

df.gt(10).rolling(3, axis=1).sum().ge(2)

       0      1      2      3      4
0  False  False  False  False  False
1  False  False  False  False  False
2  False  False   True   True   True
3  False  False   True   True   True
4  False  False   True   True   True

You can nail down the exact requested output by masking where na.

df.gt(10).rolling(3, axis=1).sum().pipe(lambda d: d.ge(2).mask(d.isna()))

    0   1      2      3      4
0 NaN NaN  False  False  False
1 NaN NaN  False  False  False
2 NaN NaN   True   True   True
3 NaN NaN   True   True   True
4 NaN NaN   True   True   True

answered Apr 15, 2018 at 8:19

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Comments

Vivek Kalyanarangan · Accepted Answer · 2018-04-15 07:09:55Z

You need -

import pandas as pd
import numpy as np
df = pd.DataFrame([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],[16,17,18,19,20],[21,22,23,24,25]])
df1 = df.apply(lambda x: pd.Series([np.nan, np.nan]+[all(j>10 for j in i) for i in zip(x[0::1], x[1::1], x[2::1])]), axis=1)

print(df1)

Output

0   1      2      3      4
0 NaN NaN  False  False  False
1 NaN NaN  False  False  False
2 NaN NaN   True   True   True
3 NaN NaN   True   True   True
4 NaN NaN   True   True   True

Explanation

list(zip(x[0::1], x[1::1], x[2::1])

breaks it down to taking 3 columns at a time for every row -

0             [(1, 2, 3), (2, 3, 4), (3, 4, 5)]
1            [(6, 7, 8), (7, 8, 9), (8, 9, 10)]
2    [(11, 12, 13), (12, 13, 14), (13, 14, 15)]
3    [(16, 17, 18), (17, 18, 19), (18, 19, 20)]
4    [(21, 22, 23), (22, 23, 24), (23, 24, 25)]

all(j>10 for j in i)

Checks for each element in the list of tuples and then outputs True if all the elements in the tuple are greater than 10

Concatenating [np.nan, np.nan] to match your output. Hope that helps.

Collectives™ on Stack Overflow

Python pandas: apply a function to dataframe.rolling()

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related