Pandas how to apply multiple functions to dataframe

Question

Is there a way to apply a list of functions to each column in a DataFrame like the DataFrameGroupBy.agg function does? I found an ugly way to do it like this:

df=pd.DataFrame(dict(one=np.random.uniform(0,10,100), two=np.random.uniform(0,10,100)))
df.groupby(np.ones(len(df))).agg(['mean','std'])

        one                 two
       mean       std      mean       std
1  4.802849  2.729528  5.487576  2.890371

unutbu · Accepted Answer · 2018-02-27 12:44:53Z

33

For Pandas 0.20.0 or newer, use df.agg (thanks to ayhan for pointing this out):

In [11]: df.agg(['mean', 'std'])
Out[11]: 
           one       two
mean  5.147471  4.964100
std   2.971106  2.753578

For older versions, you could use

In [61]: df.groupby(lambda idx: 0).agg(['mean','std'])
Out[61]: 
        one               two          
       mean       std    mean       std
0  5.147471  2.971106  4.9641  2.753578

Another way would be:

In [68]: pd.DataFrame({col: [getattr(df[col], func)() for func in ('mean', 'std')] for col in df}, index=('mean', 'std'))
Out[68]: 
           one       two
mean  5.147471  4.964100
std   2.971106  2.753578

edited Feb 27, 2018 at 12:44

answered Mar 2, 2014 at 13:40

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user2285236 Over a year ago

agg is now available as a DataFrame method so this works without the trick too: df.agg(['mean', 'std']).

saias Over a year ago

I have notice that using agg is a lot slower than just applying a function in the df. i.e df.sum(), df. mean() instead of df.agg(['sum'], 'mean']). is there a reason for that or am I doing something wrong?

unutbu Over a year ago

@saias: It might be worth asking this as a new question. My guess is that df.agg(['sum','mean']) ultimately calls pandas.core.base.SelectionMixin._aggregate which handles many different cases for input and output. All that extra case handling slows down the performance of df.agg. In this case, you can bypass a lot of that code by building the desired DataFrame yourself with something like pd.DataFrame({'sum':df.sum(), 'mean':df.mean()}).T.

Doctor J · Accepted Answer · 2016-03-30 22:51:51Z

17

In the general case where you have arbitrary functions and column names, you could do this:

df.apply(lambda r: pd.Series({'mean': r.mean(), 'std': r.std()})).transpose()

         mean       std
one  5.366303  2.612738
two  4.858691  2.986567

answered Mar 30, 2016 at 22:51

Doctor J

6,3726 gold badges48 silver badges41 bronze badges

Comments

Souvik Daw · Accepted Answer · 2020-02-16 06:37:54Z

2

I tried to apply three functions into a column and it works

#removing new line character
rem_newline = lambda x : re.sub('\n',' ',x).strip()

#character lower and removing spaces
lower_strip = lambda x : x.lower().strip()

df = df['users_name'].apply(lower_strip).apply(rem_newline).str.split('(',n=1,expand=True)

edited Feb 16, 2020 at 6:37

answered Feb 16, 2020 at 6:08

Souvik Daw

1291 silver badge7 bronze badges

Comments

Sergio Lucero · Accepted Answer · 2018-04-30 19:22:43Z

I am using pandas to analyze Chilean legislation drafts. In my dataframe, the list of authors are stored as a string. The answer above did not work for me (using pandas 0.20.3). So I used my own logic and came up with this:

df.authors.apply(eval).apply(len).sum()

Concatenated applies! A pipeline!! The first apply transforms

"['Barros Montero: Ramón', 'Bellolio Avaria: Jaime', 'Gahona Salazar: Sergio']"

into the obvious list, the second apply counts the number of lawmakers involved in the project. I want the size of every pair (lawmaker, project number) (so I can presize an array where I will study which parties work on what).

Interestingly, this works! Even more interestingly, that last call fails if one gets too ambitious and does this instead:

df.autores.apply(eval).apply(len).apply(sum)

with an error:

TypeError: 'int' object is not iterable

coming from deep within /site-packages/pandas/core/series.py in apply

Collectives™ on Stack Overflow

Pandas how to apply multiple functions to dataframe

4 Answers 4

3 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related