3

I have a pd data frame in which the column called "process_id" has, for multiple time steps, different parameters associated with it. I want to extract several information from these and put them into a new data frame (so I don't have to use all the details of the data). Below is an example of what I mean, where I keep, for each "process_id" the min, max, mean and std of each parameter and I also define a lambda function to save the mean of the parameters in the last 5 timesteps:

features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', lambda x: x.tail(5).mean()])

This works fine and the lambda function changes the name of the parameter in the table to something like this: "parameter_lambda" (not sure how, but it works). Now the problem is that if I want to add another lambda function, something like this (or any other lambda definition):

features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', lambda x: x.tail(5).mean(),lambda x: x.iloc[0:int(len(df)/5)].mean()])

I get this error:

Function names must be unique, found multiple named

Which makes sense, as both lambda functions will have the same name in the data frame. But I don't know how to get around this.

I tried something like this:

df.groupby('dummy').agg({'returns':{'Mean': np.mean, 'Sum': np.sum}})

as described here, but I am getting this error:

SpecificationError: cannot perform renaming for returns with a nested dictionary

Can someone help me? Thank you!

2
  • This doesn't make sense. Please include an minimal reproducible example Commented Feb 10, 2019 at 19:16
  • I'm inclined to think of this as a bug in pandas, if it accepts functions but relies on the __name__ attribute to distinguish them. Commented Feb 10, 2019 at 19:31

2 Answers 2

6

lambda function will have the problem with duplicate name errors when there are more than one para created by lambda

fuc1=lambda x: x.tail(5).mean()
fuc1.__name__ = 'tail_mean'

fuc2=lambda x: x.iloc[0:int(len(df)/5)].mean()
fuc2.__name__ = 'len_mean'

features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', fuc1,fuc2])
Sign up to request clarification or add additional context in comments.

2 Comments

That's great! Thanks a lot!
@BillKet yw :-) btw if it is what you need , would like accept it ?
0
features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', lambda x: x.tail(5).mean(),lambda y: y.iloc[0:int(len(df)/5)].mean()])

Try with x and y instead of x and x

df.groupby('dummy').agg({'returns': [np.mean, np.sum]})

Also, try this

2 Comments

Thank you, but I am still getting the same error. The name comes purely from "lambda". It doesn't contain the variable in it.
I think @Wen-Ben 's answer might be it. I learnt something new too

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.