1

After doing some research, I found the following (Apply different functions to different items in group object: Python pandas). This is perhaps the exact same thing that I want, but I am unable to make sense of the answers that are being proposed. Let me try and explain with a simple example what I want:

import pandas as pd
import numpy as np

df = pd.DataFrame({'B': ['one', 'one', 'two', 'three',
                         'two', 'two', 'one', 'three'],
                   'C': np.random.randn(8),
                   'D': np.random.randn(8)})
grouped = df.groupby(['B'])

Let us say we have the simple data set built from the above that looks like this:

       B         C         D
0    one -1.758565 -1.544788
1    one -0.309472  2.289912
2    two -1.885911  0.384215
3  three  0.444186  0.551217
4    two -0.502636  2.125921
5    two -2.247551 -0.188705
6    one -0.575756  1.473056
7  three  0.640316 -0.410318

Upon grouping them on column 'B', there were 3 groups created

  1. one
  2. two
  3. three

Now, how can I apply different functions on these groups, but still have them as part of the same data frame. For e.g. if I wanted to check if elements were < 0.5 in group 1, divisible by 2 in group 2 and -ve in group 3. These functions are for illustrative purposes only, the point I want to stress on is that they should be different custom functions that should be applied on each group, but the result should be something we can look at in one data frame. Any advice is appreciated.

3
  • Can you show what you mean exactly? Run the functions manually on the groups? Commented Aug 4, 2020 at 22:43
  • @MadPhysicist, I don't want to implement them manually. I just want different functions applied on different groups and then have the result in one data frame rather than handling each group separately as its own data frame Commented Aug 4, 2020 at 22:45
  • 1
    kindly post your expected output Commented Aug 4, 2020 at 22:46

1 Answer 1

3

You can use np.where to define whatever logic you want:

df['Flag'] = np.where((df['B'] == 'one') & (df['C'] < 0.5), True, False)
df['Flag'] = np.where((df['B'] == 'two') & (df['C'] >= 0.5), True, df['Flag'])
df['Flag'] = np.where((df['B'] == 'three') & (df['C'] < 0.5), True, df['Flag'])

Out[85]: 
       B         C         D   Flag
0    one -1.758565 -1.544788   True
1    one -0.309472  2.289912   True
2    two -1.885911  0.384215  False
3  three  0.444186  0.551217   True
4    two -0.502636  2.125921  False
5    two -2.247551 -0.188705  False
6    one -0.575756  1.473056   True
7  three  0.640316 -0.410318  False

From there, let's say you then want to groupby the total that are True:

df = df.groupby('B')['Flag'].sum().reset_index()

       B    Flag
0    one     3.0
1  three     1.0
2    two     0.0

To implement as an adjustable custom function (per comment), you can do:

def flag(one, two, three):
    df['Flag'] = np.where((df['B'] == 'one') & (one), True, False)
    df['Flag'] = np.where((df['B'] == 'two') & (two), True, df['Flag'])
    df['Flag'] = np.where((df['B'] == 'three') & (three), True, df['Flag'])


flag(one=df['C'] < 0.5, two=df['C'] >= 0.5, three=df['C'] < 0.5)
df

B         C         D   Flag
0    one -1.758565 -1.544788   True
1    one -0.309472  2.289912   True
2    two -1.885911  0.384215  False
3  three  0.444186  0.551217   True
4    two -0.502636  2.125921  False
5    two -2.247551 -0.188705  False
6    one -0.575756  1.473056   True
7  three  0.640316 -0.410318  False
Sign up to request clarification or add additional context in comments.

2 Comments

David Erickson, Can the same approach be applied if a custom function needs to be applied instead of a simple logical check as you have shown in np.where ?
@UGuntupalli see my revised answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.