16

When using df.mean() I get a result where the mean for each column is given. Now let's say I want the mean of the first column, and the sum of the second. Is there a way to do this? I don't want to have to disassemble and reassemble the DataFrame.

My initial idea was to do something along the lines of pandas.groupby.agg() like so:

df = pd.DataFrame(np.random.random((10,2)), columns=['A','B'])
df.apply({'A':np.mean, 'B':np.sum}, axis=0)

Traceback (most recent call last):

  File "<ipython-input-81-265d3e797682>", line 1, in <module>
    df.apply({'A':np.mean, 'B':np.sum}, axis=0)

  File "C:\Users\Patrick\Anaconda\lib\site-packages\pandas\core\frame.py", line 3471, in apply
    return self._apply_standard(f, axis, reduce=reduce)

  File "C:\Users\Patrick\Anaconda\lib\site-packages\pandas\core\frame.py", line 3560, in _apply_standard
    results[i] = func(v)

TypeError: ("'dict' object is not callable", u'occurred at index A')

But clearly this doesn't work. It seems like passing a dict would be an intuitive way of doing this, but is there another way (again without disassembling and reassembling the DataFrame)?

3 Answers 3

17

You can try a closure:

def multi_func(functions):
    def f(col):
        return functions[col.name](col)
    return f

df = pd.DataFrame(np.random.random((10, 2)), columns=['A', 'B'])
result = df.apply(multi_func({'A': np.mean, 'B': np.sum}))
Sign up to request clarification or add additional context in comments.

5 Comments

This is pretty nice actually. My workaround was inserting a column of ones into the dataframe, doing groupby on that column then passing a dict to the aggregate method.
Thank you! I notice that this fails if there are more columns in the DataFrame than keys in the function dict. @bill-letson have you seen that too?
A full implementation should include a try KeyError clause which returns an identity function: lambda x : x
@phil_20686 You can do that by replacing functions[col.name](col) with functions.get(col.name, lambda x: x)(col)
If you want to do this in a one liner, the following solution worked for me: df.apply(lambda x: functions.get(x.name, lambda x: x)(x))
12

I think you can use the agg method with a dictionary as the argument. For example:

df = pd.DataFrame({'A': [0, 1, 2], 'B': [3, 4, 5]})

df =
A   B
0   0   3
1   1   4
2   2   5

df.agg({'A': 'mean', 'B': sum})

A     1.0
B    12.0
dtype: float64

To add, it seems the example provided in the question also works now (as of version 1.5.3).

import numpy as np

df = pd.DataFrame(np.random.random((10,2)), columns=['A','B'])
df.apply({'A':np.mean, 'B':np.sum}, axis=0)

A    0.495771
B    5.939556
dtype: float64

Comments

2

Just faced this situation myself and came up with the following:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame([['one', 'two'], ['three', 'four'], ['five', 'six']], 
   ...:                   columns=['A', 'B'])

In [3]: df
Out[3]: 
       A     B
0    one   two
1  three  four
2   five   six

In [4]: converters = {'A': lambda x: x[:1], 'B': lambda x: x.replace('o', '')}

In [5]: new = pd.DataFrame.from_dict({col: series.apply(converters[col]) 
   ...:                               if col in converters else series
   ...:                               for col, series in df.iteritems()})

In [6]: new
Out[6]: 
   A    B
0  o   tw
1  t  fur
2  f  six

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.