Using multiple lambda functions with a pandas dataframe

Question

I have a pd data frame in which the column called "process_id" has, for multiple time steps, different parameters associated with it. I want to extract several information from these and put them into a new data frame (so I don't have to use all the details of the data). Below is an example of what I mean, where I keep, for each "process_id" the min, max, mean and std of each parameter and I also define a lambda function to save the mean of the parameters in the last 5 timesteps:

features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', lambda x: x.tail(5).mean()])

This works fine and the lambda function changes the name of the parameter in the table to something like this: "parameter_lambda" (not sure how, but it works). Now the problem is that if I want to add another lambda function, something like this (or any other lambda definition):

features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', lambda x: x.tail(5).mean(),lambda x: x.iloc[0:int(len(df)/5)].mean()])

I get this error:

Function names must be unique, found multiple named

Which makes sense, as both lambda functions will have the same name in the data frame. But I don't know how to get around this.

I tried something like this:

df.groupby('dummy').agg({'returns':{'Mean': np.mean, 'Sum': np.sum}})

as described here, but I am getting this error:

SpecificationError: cannot perform renaming for returns with a nested dictionary

Can someone help me? Thank you!

This doesn't make sense. Please include an minimal reproducible example — roganjosh
– roganjosh, Commented Feb 10, 2019 at 19:16
I'm inclined to think of this as a bug in pandas, if it accepts functions but relies on the __name__ attribute to distinguish them. — chepner
– chepner, Commented Feb 10, 2019 at 19:31

BENY · Accepted Answer · 2019-02-10 19:21:08Z

6

lambda function will have the problem with duplicate name errors when there are more than one para created by lambda

fuc1=lambda x: x.tail(5).mean()
fuc1.__name__ = 'tail_mean'

fuc2=lambda x: x.iloc[0:int(len(df)/5)].mean()
fuc2.__name__ = 'len_mean'

features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', fuc1,fuc2])

answered Feb 10, 2019 at 19:21

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

JohnDoe122 Over a year ago

That's great! Thanks a lot!

BENY Over a year ago

@BillKet yw :-) btw if it is what you need , would like accept it ?

ycx · Accepted Answer · 2019-02-10 19:16:59Z

0

features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', lambda x: x.tail(5).mean(),lambda y: y.iloc[0:int(len(df)/5)].mean()])

Try with x and y instead of x and x

df.groupby('dummy').agg({'returns': [np.mean, np.sum]})

Also, try this

answered Feb 10, 2019 at 19:16

ycx

3,2193 gold badges16 silver badges26 bronze badges

2 Comments

JohnDoe122 Over a year ago

Thank you, but I am still getting the same error. The name comes purely from "lambda". It doesn't contain the variable in it.

ycx Over a year ago

I think @Wen-Ben 's answer might be it. I learnt something new too

Collectives™ on Stack Overflow

Using multiple lambda functions with a pandas dataframe

2 Answers 2

2 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related