Compare values from a DataFrame and replace with closest values, given a list

Question

I have a DataFrame called 'Dataex', and an ascending list called 'steps'.

import pandas as pd
import numpy as np

if __name__ == "__main__":

    Dataex = [[0.6,  0.36],
               [0.6,  0.36],
               [0.9,  0.81],
               [0.8,  0.64],
               [1.0,  1.00],
               [1.0,  1.00],
               [0.9,  0.81],
               [1.2,  1.44],
               [1.0,  1.00],
               [1.0,  1.00],
               [1.2,  1.44],
               [1.1,  1.21]]
            
    Dataex = pd.DataFrame(data = Dataex, columns = ['Lx', 'A'])
    
    steps = [0, 0.75, 1, 1.25, 1.5, 1.75 ,2, 2.25, 2.4, 2.5, 2.75, 3, 
                   3.25, 3.5, 3.75, 4, 4.25, 4.5, 4.75, 5, 5.25, 5.5, 5.75, 6]
    
#    steps = np.array(steps) 
#    Dataex["Lx_new"] = steps[np.searchsorted(steps, Dataex["Lx"])]

What I am looking for is: that each value of 'Dataex [' Lx ']' is compared with the closest values found in 'steps', and then replaced by the closest limits, whether to the left or to the right, I will illustrate with some examples:

Example case 1: Dataex [‘Lx’] = 0.8 when compared with the ‘steps’ list, it is between the interval of [0.75 - 1], and is closer to the lower limit, so the new value must be 0.75.

Example case 2: Dataex [‘Lx’] = 1.2 when compared with the list ‘steps’, is between the interval of [1 - 1.25], and is closer to the upper limit, so the new value must be 1.25.

Example case 3: Dataex [‘Lx’] = 1, in this case if I compare with ‘steps’, there is a value equal to 1 in the list, so the new value must be the same, that is, 1.

In short, I should have something like this:

 Lx     A   Lx_new
0.6  0.36     0.75
0.6  0.36     0.75
0.9  0.81        1
0.8  0.64     0.75
  1     1        1
  1     1        1
0.9  0.81        1
1.2  1.44     1.25
  1     1        1
  1     1        1
1.2  1.44     1.25
1.1  1.21        1

Interesting. The numbers in steps appear to be consistent quarters, e.g., 1, 1.25, 1.5, 1.75, 2, etc., except for 2, where there is a 2.4 in the middle, and the the numbers proceed in quarters like normal. Is that intentional? — user17242583
– user17242583, Commented Nov 9, 2021 at 21:36
Yes, it is intentional and comes from a previous calculation @user17242583 — DaniV
– DaniV, Commented Nov 9, 2021 at 22:26

Capybara · Accepted Answer · 2022-06-06 03:04:51Z

3

This can be accomplished using apply and a lambda function to find the index of the closest value in steps.

steps = np.array(steps)
Dataex["Lx_new"] = Dataex["Lx"].apply(lambda x: steps[np.argmin(np.abs(x-steps))])

edited Jun 6, 2022 at 3:04

answered Nov 9, 2021 at 21:41

Capybara

8571 gold badge11 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user17242583 Over a year ago

Brilliant! Ingenious!

user17242583 Over a year ago

It could be faster if you first converted steps to a numpy array, i.e. steps = np.array(steps) and then used x - steps instead of [x-s for s in steps].

Capybara Over a year ago

Thanks for the suggestion @user17242583. I updated my answer to reflect it.

DaniV Over a year ago

Thank you very much for your contributions, an excellent day.

Collectives™ on Stack Overflow

Compare values from a DataFrame and replace with closest values, given a list

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related