I want to be able to use a "groupby" on my pandas dataframe using different custom functions for each columns. For example, if I have this as input:
annotator event interval_presence duration
3 birds [0,5] 5
3 birds [7,9] 10
3 voices [1,2] 10
3 traffic [1,7] 7
5 voices [4,7] 4
5 voices [5,10] 6
5 traffic [0,1] 4
Where each item in "interval_presence" is a pandas interval. When merging, I want to take the mean of column "duration" and I want to use "pd.arrays.IntervalArray" and "piso.union" on my intervals in "interval_presence". So this would be the output:
annotator event interval_presence duration
3 birds [[0,5],[7,9]] 7.5
3 voices [1,2] 10
3 traffic [1,7] 7
5 voices [4,10] 5
5 traffic [0,1] 4
Right now, I know how to merge my intervals thanks to the answer in the post: Pandas: how to merge rows by union of intervals. So the solution would be:
data = data.groupby(['annotator', 'event'])['interval_presence'] \
.apply(pd.arrays.IntervalArray) \
.apply(piso.union) \
.reset_index()
But how can I simultaneously apply a "mean" function to "duration" ?
groupby.aggwith a{'colname': function}dictionary.df = df.groupby(['annotator', 'event'])['interval_presence'].agg({ 'interval_presence':'.apply(pd.arrays.IntervalArray).apply(piso.union)', 'duration':'mean'}).reset_index()isn't a good syntax