How to group and apply multiple functions?

Question

This is my df:

import pandas as pd
df = pd.DataFrame({'id': [1,1,1,1,2,2,3,3,3], 
                   'col1': [7,6,12,1,3,6,10,11,12],
                   'col2': [1.2,0.8,0.9,1.1,2.0,1.8,0.7,0.9,1.2]})

I want to apply 2 functions, each of which returns strictly 1 output.

def myfunc1(g):
    var1 = g['col1'].iloc[0]
    var2 = g.loc[g['col2'] > 1, 'col1'].iloc[0]

    return var1 / var2

def myfunc2(g):
    var1 = g['col1'].iloc[0]
    var2 = g.loc[g['col2'] < 1, 'col1'].iloc[0]

    return var2 - var1

If I run them this way, the code fails:

df[['new_col1','new_col2']] = df.groupby("id").apply(myfunc1,myfunc2)

However, if I run them separately (see below), everything works fine:

df['new_col1'] = df.groupby("id").apply(myfunc1)
df['new_col2'] = df.groupby("id").apply(myfunc2)

The expected output should have the following columns:

blade_id
new_col1
new_col2

myfunc2 throws an error because one of the groups does not have an iloc[0] (id 2 does not have a value below 1) — Laurens Koppenol
– Laurens Koppenol, Commented Jul 18, 2019 at 11:34
You could use only one function and return multiple columns, you can refer to this post : stackoverflow.com/questions/46696807/… — Clem G.
– Clem G., Commented Jul 18, 2019 at 11:36
I do not think you are accomplishing what you want to do. Because you groupby "id", the index of the return Series is also "id"; it has values [1, 2, 3]. When you assign a column like you are doing now, it will match based on index of the original dataframe, which is 0-8. So only ROWS 1, 2, 3 will receive a value for your new columns. Please post the result you expect! and look at @ClemG.'s link! — Laurens Koppenol
– Laurens Koppenol, Commented Jul 18, 2019 at 11:38

jezrael · Accepted Answer · 2019-07-18 11:43:42Z

1

You can call only one function, so possible solution is create another function:

def myfunc1(g):
    var1 = g['col1'].iloc[0]
    #return missing value if no match
    var2 = next(iter(g.loc[g['col2'] > 1, 'col1']), np.nan)

    return var1 / var2

def myfunc2(g):
    var1 = g['col1'].iloc[0]
    #return missing value if no match
    var2 = next(iter(g.loc[g['col2'] < 1, 'col1']), np.nan)

    return var2 - var1

def f(x):
    return pd.Series([myfunc1(x), myfunc2(x)], index=['new_col1','new_col2'])

df1 = df.groupby("id").apply(f)
print (df1)
    new_col1  new_col2
id                    
1   1.000000      -1.0
2   1.000000       NaN
3   0.833333       0.0

Or create new function from both:

def myfunc3(g):
    var1 = g['col1'].iloc[0]
    var2 = next(iter(g.loc[g['col2'] > 1, 'col1']), np.nan)
    var3 = next(iter(g.loc[g['col2'] < 1, 'col1']), np.nan)

    return  pd.Series([var1 / var2, var3 - var1], index=['new_col1','new_col2'])


df1 = df.groupby("id").apply(myfunc3)
print (df1)
    new_col1  new_col2
id                    
1   1.000000      -1.0
2   1.000000       NaN
3   0.833333       0.0

answered Jul 18, 2019 at 11:43

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Fluxy Over a year ago

Cool! Thanks a lot:)

Collectives™ on Stack Overflow

How to group and apply multiple functions?

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related