1

I am trying to remove duplicated based on multiple criteria:

  1. Find duplicated in column df['A']

  2. Check column df['status'] and prioritize OK vs Open and Open vs Close

  3. if we have a duplicate with same status pick the lates one based on df['Col_1]

df = pd.DataFrame({'A' : ['11', '11', '12', np.nan, '13', '13', '14', '14', '15'], 'Status' : ['OK','Close','Close','OK','OK','Open','Open','Open',np.nan], 'Col_1' :[2000, 2001, 2000, 2000, 2000, 2002, 2000, 2004, 2000]}) df

Expected output:

enter image description here

I have tried differente solutions like the links below (map or loc) but I am unable to find the correct way:

Pandas : remove SOME duplicate values based on conditions

1 Answer 1

1

Create ordered categorical for prioritize Status, then sorting per all columns, remove duplicates by first column A and last sorting index:

c = ['OK','Open','Close']
df['Status'] = pd.Categorical(df['Status'], ordered=True, categories=c)

df = df.sort_values(['A','Status','Col_1']).drop_duplicates('A').sort_index()
print (df)
     A Status  Col_1
0   11     OK   2000
2   12  Close   2000
3  NaN     OK   2000
4   13     OK   2000
6   14   Open   2000
8   15    NaN   2000

EDIT If need avoid NaNs are removed add helper column:

df['test'] = df['A'].isna().cumsum()

c = ['OK','Open','Close']
df['Status'] = pd.Categorical(df['Status'], ordered=True, categories=c)

df = (df.sort_values(['A','Status','Col_1', 'test'])
        .drop_duplicates(['A', 'test'])
        .sort_index())
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you very much and is there a way to keep all NaNs in column A if we have multiple NaNs?
I just realized that the code works fine a part from the dates which are not selected by the latest one
@Caiotru - Can you explain more?
@Caiotru - One idea - are dates strings? Or datetimes? Or numbers?
For example if I have 2 OKs with the same code I would like the latest date to be picked, I would like to keep = 'last' if they are in order
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.