Lambda function to use in dataframe

Question

I have the following vector

And I would like to implement a lambda function that given a vector element i , computes the mean value of i-3 ,i-2 i-1 and ith element. But I do not know how can I access the i-3, i-2, i-1 elements in the lambda function.

if all the function has access to is the element, you cannot... but you should give more detail. Like, why does it have to be a lambda function? — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Mar 6, 2017 at 8:48
Because I would like to explore more about the possibilities of this feature. Do you have any alternative without using any for loop? — JPV
– JPV, Commented Mar 6, 2017 at 8:59

Schmuddi · Accepted Answer · 2017-03-06 09:55:08Z

You can use the rolling() method to access the elements of a Pandas series within a specified window. Then, you can use a lambda function to calculate the mean for the elements in that window. In order to include the three elements to the left of the current element, you use a window size of 4:

In [39]: import pandas as pd

In [40]: S = pd.Series([3, 5, 6, 7, 4, 6, 7, 8])

In [41]: S.rolling(4).apply(lambda x: pd.np.mean(x))
Out[41]: 
0     NaN
1     NaN
2     NaN
3    5.25
4    5.50
5    5.75
6    6.00
7    6.25
dtype: float64

You'll note that there are missing values for the first three elements. This is so because you can only start to form a window of the size 4 from the fourth element onwards. However, if you want to calculate with smaller windows for the first elements, you can use the argument min_periods to specify the smallest valid window size:

In [42]: S.rolling(4, min_periods=1).apply(lambda x: pd.np.mean(x))
Out[42]: 
0    3.000000
1    4.000000
2    4.666667
3    5.250000
4    5.500000
5    5.750000
6    6.000000
7    6.250000
dtype: float64

Having said that, you don't need the lambda in the first place – I included it only because you explicitly asked for lambdas. The method rolling() creates a Rolling object that has a built-in mean function that you can use, like so:

In [43]: S.rolling(4).mean()
Out[43]: 
0     NaN
1     NaN
2     NaN
3    5.25
4    5.50
5    5.75
6    6.00
7    6.25
dtype: float64

Community · Accepted Answer · 2017-05-23 12:17:14Z

2

if you want to do it on a pandas dataframe the easiest way is to use .loc, assuming you know the index position of i.

 import pandas as pd

 df = pd.DataFrame([3, 5, 6, 7, 4, 6, 7 ,8])
 setx = lambda x: df.loc[x:x-3:-1].mean()
 # x is the index position of your target value.
 > setx(4) # Without mean() gives values [4, 7, 6, 5]
 >> 5.5

Although if you want to stick with PEP8 standards it is best to define a function and avoid lambda in cases where (see python.org/dev/peps/pep-0008/#id50), assigning functions to an identifier by means of a lambda expression that is advised against in PEP8. Thank you @Schmuddi for the clarification.

edited May 23, 2017 at 12:17

CommunityBot

11 silver badge

answered Mar 6, 2017 at 9:12

Dan Temkin

1,6451 gold badge15 silver badges18 bronze badges

2 Comments

Schmuddi Over a year ago

To clarify your PEP8 comment: While it's true that your example isn't recommended by PEP8 (see python.org/dev/peps/pep-0008/#id50), it's not true that it's best practice to only use lambda with "map/filter/reduce" – it's just assigning functions to an identifier by means of a lambda expression that is advised against in PEP8.

Dan Temkin Over a year ago

Thanks for elaborating. Honestly, I don't think I knew why it was just the simplified version I used but it is good to have a more explicit method that I can use in my madness lol. :)

Collectives™ on Stack Overflow

Lambda function to use in dataframe

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related