1

Working on data frame, which contains segments( with two endpoints ), I have to find mid points for each segment, and finally insert a row in the df with the mid point coordinates between the two end point rows.

Below is the df:

   id         x          y
0   1    0.8000       1.90
1   1    0.8833       2.00
2   2    1.0000       2.14
3   2    1.3000       2.50

Points with the same id are the end points of the same segment.

Have created the following simple func (basically calculatingd Mean) :

def find_mpt(x1, y1, x2, y2):
    x, y = ( x1 + x2) / 2 , (y1 + y2) / 2
    return x, y

Want to apply the func to entire df, and insert the resultant rows specifically between the end point rows, as following:

   id         x          y
0   1    0.8000       1.90
1   1    0.8416       1.95  #new row 
2   1    0.8833       2.00
3   2    1.0000       2.14
4   2    1.1500       2.32  #new row       
5   2    1.3000       2.50

Maybe I can use df.groupby(['id']) and then apply the function, but still have no idea how to insert rows at those specific locations.

7
  • Does each id always have 2 values ? Commented Aug 9, 2017 at 23:05
  • Also are you basically taking the mean or is it a placeholder function? Commented Aug 9, 2017 at 23:06
  • @ayhan Yes, each id has two values each for x and y Commented Aug 9, 2017 at 23:12
  • I was able to get your desired values with df.apply(lambda x: find_mpt(x[0], x[2], x[1], x[3])), but I don't know how you would go about inserting them into the dataframe without creating a new dataframe. Commented Aug 9, 2017 at 23:27
  • I'd do pd.concat((df, df.groupby('id', as_index=False).mean())).sort_values(['id', 'x']) to get this output but I am making a few assumptions: 1) you want the mean, 2) you want to sort the id, 3) you have no negative values (so when sorting by x the average will be in the middle) Commented Aug 9, 2017 at 23:30

2 Answers 2

1

It is possible to specify your aggregation method. Based on the defined function, I'll take it that you want to add the mean of x and y to your df. As there are only two endpoints involved, it simplifies the procedure.

df2 = df.groupby('id').agg('mean').reset_index()
df_final = pd.concat((df, df2)).sort_values(['id', 'y']).reset_index(drop=True)
print(df_final)

   id      x     y
0   1  0.80000  1.90
1   1  0.84165  1.95
2   1  0.88330  2.00
3   2  1.00000  2.14
4   2  1.15000  2.32
5   2  1.30000  2.50

And yes, this could also have been achieved in a one-liner.

Sign up to request clarification or add additional context in comments.

Comments

0

This is calculated by self function,

def find_mpt(x):
    ret = np.mean(x)
    return ret

pd.concat([df2,df2.groupby('id',as_index=False).apply(lambda x:find_mpt(x))],axis=0).sort_values(['id','y'])


Out[26]: 
    id        x     y
0  1.0  0.80000  1.90
0  1.0  0.84165  1.95
1  1.0  0.88330  2.00
2  2.0  1.00000  2.14
1  2.0  1.15000  2.32
3  2.0  1.30000  2.50

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.