Pandas: apply different custom functions to different columns when using groupby

Question

I want to be able to use a "groupby" on my pandas dataframe using different custom functions for each columns. For example, if I have this as input:

annotator  event          interval_presence   duration
3          birds          [0,5]               5
3          birds          [7,9]               10
3          voices         [1,2]               10
3          traffic        [1,7]               7
5          voices         [4,7]               4
5          voices         [5,10]              6
5          traffic        [0,1]               4

Where each item in "interval_presence" is a pandas interval. When merging, I want to take the mean of column "duration" and I want to use "pd.arrays.IntervalArray" and "piso.union" on my intervals in "interval_presence". So this would be the output:

annotator  event          interval_presence   duration
3          birds          [[0,5],[7,9]]       7.5
3          voices         [1,2]               10
3          traffic        [1,7]               7
5          voices         [4,10]              5
5          traffic        [0,1]               4

Right now, I know how to merge my intervals thanks to the answer in the post: Pandas: how to merge rows by union of intervals. So the solution would be:

data = data.groupby(['annotator', 'event'])['interval_presence'] \
    .apply(pd.arrays.IntervalArray) \
    .apply(piso.union) \
    .reset_index()

But how can I simultaneously apply a "mean" function to "duration" ?

Use groupby.agg with a {'colname': function} dictionary. — mozway
– mozway, Commented Jan 28, 2023 at 16:47
I've seen that I can use .agg, but what's the syntax when using custom functions ? Because something like df = df.groupby(['annotator', 'event'])['interval_presence'].agg({ 'interval_presence':'.apply(pd.arrays.IntervalArray).apply(piso.union)', 'duration':'mean'}).reset_index() isn't a good syntax — M.Tailleur
– M.Tailleur, Commented Jan 28, 2023 at 16:51

Code Different · Accepted Answer · 2023-01-28 16:59:26Z

1

You used the wrong agg syntax. Try this:

df.groupby(["annotator", "event"]).agg({
    "interval_presence": lambda s: piso.union(pd.arrays.IntervalArray(s)),
    "duration": "mean"
})

Within the lambda, s is a series of pd.Interval objects.

answered Jan 28, 2023 at 16:59

Code Different

93.4k16 gold badges154 silver badges175 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas: apply different custom functions to different columns when using groupby

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related