How to apply a function to pandas columns using groupby?

Question

Working on data frame, which contains segments( with two endpoints ), I have to find mid points for each segment, and finally insert a row in the df with the mid point coordinates between the two end point rows.

Below is the df:

   id         x          y
0   1    0.8000       1.90
1   1    0.8833       2.00
2   2    1.0000       2.14
3   2    1.3000       2.50

Points with the same id are the end points of the same segment.

Have created the following simple func (basically calculatingd Mean) :

def find_mpt(x1, y1, x2, y2):
    x, y = ( x1 + x2) / 2 , (y1 + y2) / 2
    return x, y

Want to apply the func to entire df, and insert the resultant rows specifically between the end point rows, as following:

   id         x          y
0   1    0.8000       1.90
1   1    0.8416       1.95  #new row 
2   1    0.8833       2.00
3   2    1.0000       2.14
4   2    1.1500       2.32  #new row       
5   2    1.3000       2.50

Maybe I can use df.groupby(['id']) and then apply the function, but still have no idea how to insert rows at those specific locations.

Also are you basically taking the mean or is it a placeholder function? — user2285236
– user2285236, Commented Aug 9, 2017 at 23:06
I was able to get your desired values with df.apply(lambda x: find_mpt(x[0], x[2], x[1], x[3])), but I don't know how you would go about inserting them into the dataframe without creating a new dataframe. — Cory Madden
– Cory Madden, Commented Aug 9, 2017 at 23:27
I'd do pd.concat((df, df.groupby('id', as_index=False).mean())).sort_values(['id', 'x']) to get this output but I am making a few assumptions: 1) you want the mean, 2) you want to sort the id, 3) you have no negative values (so when sorting by x the average will be in the middle) — user2285236
– user2285236, Commented Aug 9, 2017 at 23:30

Uvar · Accepted Answer · 2017-08-09 23:43:19Z

1

It is possible to specify your aggregation method. Based on the defined function, I'll take it that you want to add the mean of x and y to your df. As there are only two endpoints involved, it simplifies the procedure.

df2 = df.groupby('id').agg('mean').reset_index()
df_final = pd.concat((df, df2)).sort_values(['id', 'y']).reset_index(drop=True)
print(df_final)

   id      x     y
0   1  0.80000  1.90
1   1  0.84165  1.95
2   1  0.88330  2.00
3   2  1.00000  2.14
4   2  1.15000  2.32
5   2  1.30000  2.50

And yes, this could also have been achieved in a one-liner.

answered Aug 9, 2017 at 23:43

Uvar

3,47215 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BENY · Accepted Answer · 2017-08-10 01:58:24Z

0

This is calculated by self function,

def find_mpt(x):
    ret = np.mean(x)
    return ret

pd.concat([df2,df2.groupby('id',as_index=False).apply(lambda x:find_mpt(x))],axis=0).sort_values(['id','y'])


Out[26]: 
    id        x     y
0  1.0  0.80000  1.90
0  1.0  0.84165  1.95
1  1.0  0.88330  2.00
2  2.0  1.00000  2.14
1  2.0  1.15000  2.32
3  2.0  1.30000  2.50

answered Aug 10, 2017 at 1:58

BENY

324k22 gold badges176 silver badges250 bronze badges

Collectives™ on Stack Overflow

How to apply a function to pandas columns using groupby?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related