0

I wish to update and replace values based on the dates within my dataframe, while removing data in other specific columns.

Data

id  date        location    status  value1  value2
CC  1/1/2022    ny          new     12      1
CC  4/1/2022    ny          new     1       1
CC  7/1/2022    ny          new     1       1
CC  10/1/2022   ny          new     1       2
CC  1/1/2023    ny          ok      1       2
CC  4/1/2023    ny          ok      1       2
CC  7/1/2023    ny          ok      1       3
CC  10/1/2023   ny          ok      1       3
BB  1/1/2022    ca          new     1       3
BB  4/1/2022    ca          new     1       3
BB  7/1/2022    ca          new     1       3
BB  10/1/2022   ca          new     12      3
BB  1/1/2023    ca          new     2       3
BB  4/1/2023    ca          new     2       3
BB  7/1/2023    ca          new     2       3
BB  10/1/2023   ca          new     2       3

Desired

id  date        location    status  value1  value2
CC  1/1/2022    ny          open        
CC  4/1/2022    ny          open        
CC  7/1/2022    ny          open        
CC  10/1/2022   ny          new     1       2
CC  1/1/2023    ny          ok      1       2
CC  4/1/2023    ny          ok      1       2
CC  7/1/2023    ny          ok      1       3
CC  10/1/2023   ny          ok      1       3
BB  1/1/2022    ca          new     1       3
BB  4/1/2022    ca          new     1       3
BB  7/1/2022    ca          new     1       3
BB  10/1/2022   ca          new     12      3
BB  1/1/2023    ca          new     2       3
BB  4/1/2023    ca          new     2       3
BB  7/1/2023    ca          new     2       3
BB  10/1/2023   ca          new     2       3

Doing

df.loc[(df.id == 'cc') & (df.date <= '07/01/2022'), 'status']= 'open'

This labels all of the dates as open and does not remove the values in the other columns. Any suggestion is appreciated. Thank you for any suggestions.

1 Answer 1

2

Unfortunately, snapping a cell out of existence does not seem to work with Pandas. Similarly, Pandas expects a value for each cell of every column when setting up a dataframe.

Therefore, nan (not a number) seems to be the exact placeholder appropriate for your case. In turn, consider, importing numpy as np and adding the line to set the respective entry to np.nan

df.loc[(df.id == 'cc') & (df.date <= '07/01/2022'), 'value1']= np.nan

Fortunately,

df.fillna("")

prints the Pandas frame without showing those annoying NAN entries but leaves the cells 'empty' as you seem to desire.

In addition, NumPy enables the use of aggregate functions to ignore nan values such as np.nanmean() as can be found here to not break computation on such tables.

Sign up to request clarification or add additional context in comments.

2 Comments

ok thank you - is this also replacing the 'new' value with 'open'? I didn't see this in the script
No, that's why wrote "adding the line" meaning [to your existing code].

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.