2

I have two dataframes. Dataframe A is:

[distance]      [measure]
17442.77000     32.792658
17442.95100     32.792658
17517.49200     37.648482
17518.29600     37.648482
17565.77600     38.287118
17565.88800     38.287118
17596.93700     41.203340
17597.29700     41.203340
17602.16400     41.477979
17602.83900     41.612774
17618.16400     42.479890
17618.71100     42.681591

and data frame B which is:

[mileage]      [Driver]
17442.8         name1
17517.5         name2
17565.8         name3
17597.2         name4
17602.5         name5
17618.4         name6

for every [mileage] row in dataframe B, I'd like to find the two rows from [distance] in dataframe A in which A.loc[(A['distance']>= milage_value) & A['distance']<= mileage_value]so I could have something like this:

17442.77000     32.792658
17442.8         name1
17442.95100     32.792658
17517.49200     37.648482
17517.5         name2
17518.29600     37.648482
.               .
.               .

So I could apply the following function in a rolling window size of 3:

def f(x):
    return df.iloc[0,1]+(df.iloc[2,1]-df.iloc[0,1])*((df.iloc[1,0]-df.iloc[0,0])/(df.iloc[2,0]-df.iloc[0,0]))
a = df.rolling(window=3, min_periods=1).apply(f)[::3].reset_index(drop=True)

So far I have been concatenating the two Dfs and sorting values to generate triplets like above, but the problem arise when two values from df B is within a distance range in A[distance]. Any hint/advice is much appreciated!

1 Answer 1

1

I think you can use the following with merge_asof using direction parameter and drop_duplicates:

df_before = pd.merge_asof(df_a, df_b, 
                 left_on='distance', 
                 right_on='mileage', 
                 direction='backward')\
              .drop_duplicates(['mileage','Driver'], keep='first')[['distance','measure']]

df_after = pd.merge_asof(df_a, df_b, 
                         left_on='distance', 
                         right_on='mileage', direction='forward')\
             .drop_duplicates(['mileage', 'Driver'], keep='last')[['distance','measure']]

df_middle = df_b.rename(columns={'Driver':'measure','mileage':'distance'})

pd.concat([df_before, df_middle, df_after]).sort_values('distance').drop_duplicates()

Output:

     distance  measure
0   17442.770  32.7927
0   17442.800    name1
1   17442.951  32.7927
2   17517.492  37.6485
1   17517.500    name2
3   17518.296  37.6485
4   17565.776  38.2871
2   17565.800    name3
5   17565.888  38.2871
6   17596.937  41.2033
3   17597.200    name4
7   17597.297  41.2033
8   17602.164   41.478
4   17602.500    name5
9   17602.839  41.6128
10  17618.164  42.4799
5   17618.400    name6
11  17618.711  42.6816
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.