1

This is my df:

import pandas as pd
df = pd.DataFrame({'id': [1,1,1,1,2,2,3,3,3], 
                   'col1': [7,6,12,1,3,6,10,11,12],
                   'col2': [1.2,0.8,0.9,1.1,2.0,1.8,0.7,0.9,1.2]})

I want to apply 2 functions, each of which returns strictly 1 output.

def myfunc1(g):
    var1 = g['col1'].iloc[0]
    var2 = g.loc[g['col2'] > 1, 'col1'].iloc[0]

    return var1 / var2

def myfunc2(g):
    var1 = g['col1'].iloc[0]
    var2 = g.loc[g['col2'] < 1, 'col1'].iloc[0]

    return var2 - var1

If I run them this way, the code fails:

df[['new_col1','new_col2']] = df.groupby("id").apply(myfunc1,myfunc2)

However, if I run them separately (see below), everything works fine:

df['new_col1'] = df.groupby("id").apply(myfunc1)
df['new_col2'] = df.groupby("id").apply(myfunc2)

The expected output should have the following columns:

  • blade_id
  • new_col1
  • new_col2
3
  • myfunc2 throws an error because one of the groups does not have an iloc[0] (id 2 does not have a value below 1) Commented Jul 18, 2019 at 11:34
  • 2
    You could use only one function and return multiple columns, you can refer to this post : stackoverflow.com/questions/46696807/… Commented Jul 18, 2019 at 11:36
  • I do not think you are accomplishing what you want to do. Because you groupby "id", the index of the return Series is also "id"; it has values [1, 2, 3]. When you assign a column like you are doing now, it will match based on index of the original dataframe, which is 0-8. So only ROWS 1, 2, 3 will receive a value for your new columns. Please post the result you expect! and look at @ClemG.'s link! Commented Jul 18, 2019 at 11:38

1 Answer 1

1

You can call only one function, so possible solution is create another function:

def myfunc1(g):
    var1 = g['col1'].iloc[0]
    #return missing value if no match
    var2 = next(iter(g.loc[g['col2'] > 1, 'col1']), np.nan)

    return var1 / var2

def myfunc2(g):
    var1 = g['col1'].iloc[0]
    #return missing value if no match
    var2 = next(iter(g.loc[g['col2'] < 1, 'col1']), np.nan)

    return var2 - var1

def f(x):
    return pd.Series([myfunc1(x), myfunc2(x)], index=['new_col1','new_col2'])

df1 = df.groupby("id").apply(f)
print (df1)
    new_col1  new_col2
id                    
1   1.000000      -1.0
2   1.000000       NaN
3   0.833333       0.0

Or create new function from both:

def myfunc3(g):
    var1 = g['col1'].iloc[0]
    var2 = next(iter(g.loc[g['col2'] > 1, 'col1']), np.nan)
    var3 = next(iter(g.loc[g['col2'] < 1, 'col1']), np.nan)

    return  pd.Series([var1 / var2, var3 - var1], index=['new_col1','new_col2'])


df1 = df.groupby("id").apply(myfunc3)
print (df1)
    new_col1  new_col2
id                    
1   1.000000      -1.0
2   1.000000       NaN
3   0.833333       0.0
Sign up to request clarification or add additional context in comments.

1 Comment

Cool! Thanks a lot:)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.