3

Have a dataframe, need to apply same calculations for many columns, currently I'm doing it manually. Any good and elegant way to do this?

tt =  pd.DataFrame(data={'Status' : ['green','green','red','blue','red','yellow','black'],
 'Group' : ['A','A','B','C','A','B','C'],
 'City' : ['Toronto','Montreal','Vancouver','Toronto','Edmonton','Winnipeg','Windsor'],
 'Sales' : [13,6,16,8,4,3,1], 'Counts' : [100,200,50,30,20,10,300]})


ss = tt.groupby('Group').agg({'Sales':['count','mean',np.median],\
                              'Counts':['count','mean',np.median]})
ss.columns =  ['_'.join(col).strip() for col in ss.columns.values]

So the result is enter image description here

How could I do this for many columns with same calculations, count, mean, median for each column if I have a very large dataframe?

1 Answer 1

3

In pandas, the agg operation takes single or multiple individual methods to be applied to relevant columns and returns a summary of the outputs. In python, lists hold and parse multiple entities. In this case, I pass a list of functions into the aggregator. In your case, you were parsing a dictionary, which means you had to handle each column individually making it very manual. Happy to explain further if not clear

ss=tt.groupby('Group').agg(['count','mean','median'])
ss.columns =  ['_'.join(col).strip() for col in ss.columns.values]
ss
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.