2

I have such DataFrame:

         date  unique_id  order_id
0  2022-09-20        111      NULL
1  2022-09-10        111      NULL
2  2022-08-10        111  2660a139
3  2022-08-08        111      NULL
4  2022-08-07        111      NULL
5  2022-08-04        222      NULL
6  2022-07-31        222  e61d1e7d
7  2022-07-20        222      NULL

Is it possible to clean data, so It would eliminate rows with null order_id`s after non-null value, based on unique_id? Outcome should look like this:

         date  unique_id  order_id
0  2022-08-10        111  2660a139
1  2022-08-08        111      NULL
2  2022-08-07        111      NULL
3  2022-07-31        222  e61d1e7d
4  2022-07-20        222      NULL

Thanks!

1 Answer 1

2

Use GroupBy.cummax with Series.notna:

df = df[df['order_id'].notna().groupby(df['unique_id']).cummax()]
#laternative
#df = df[df['order_id'].ne('NULL').groupby(df['unique_id']).cummax()]
print (df)
         date  unique_id  order_id
2  2022-08-10        111  2660a139
3  2022-08-08        111       NaN
4  2022-08-07        111       NaN
6  2022-07-31        222  e61d1e7d
7  2022-07-20        222       NaN
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.