1

How can I update an array based on the nearest value in a pandas DataFrame column? For example, I'd like to update the following array based on the "Time" column in the pandas DataFrame so that the array now contains the "X" values:

Input array:

a = np.array([
    [122.25, 225.00, 201.00],
    [125.00, 151.50, 160.62],
    [99.99, 142.25, 250.01],
])

Input DataFrame:

df = pd.DataFrame({
    'Time': [100, 125, 150, 175, 200, 225],
    'X': [26100, 26200, 26300, 26000, 25900, 25800],
})

Expected output array:

([
    [26200, 25800, 25900],
    [26200, 26300, 26300],
    [26100, 26300, 25800],
])
1
  • This is an excellent answer and should be marked as accepted! Commented Feb 22, 2022 at 16:58

1 Answer 1

2

Use merge_asof:

# Convert Time to float since your input array is float.
# merge_asof requires both sides to have the same data types
df['Time'] = df['Time'].astype('float')

# merge_asof also requires both data frames to be sorted by the join key (Time)
# So we need to flatten the input array and make note of the original order
# before going into the merge
a_ = np.ravel(a)
o_ = np.arange(len(a_))

tmp = pd.DataFrame({
    'Time': a_,
    'Order': o_
})

# Merge the two data frames and extract X in the original order
result = (
    pd.merge_asof(tmp.sort_values('Time'), df.sort_values('Time'), on='Time', direction='nearest')
        .sort_values('Order')
        ['X'].to_numpy()
        .reshape(a.shape)
)
Sign up to request clarification or add additional context in comments.

1 Comment

I have literally spent hours on this, and your code just worked perfectly! I can't tell you how much I appreciate it. Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.