0

I have a large dataframe:

import pandas as pd 
df = pd.read_csv('data.csv)

df.head()
ID  Year    status
223725  1991    No
223725  1992    No
223725  1993    No
223725  1994    No
223725  1995    No

I have many unique IDs and I want to remove duplicate rows based on the columns ID and status.

  1. If an ID has a value of Yes in status then only that row is retained, all other rows with a status value of No are removed for that specific ID.

  2. If an ID has No in every observation in status then retain any row specific to that ID.

For example, in the DataFrame below, only the row where 68084329 has a value of Yes in status should be kept i.e. the last row, all other rows with No are dropped.

 ID         Year    status
68084329    1991    No
68084329    1992    No
68084329    1993    No
68084329    1994    No
68084329    1995    No
68084329    1996    No
68084329    1997    No
68084329    1998    No
68084329    1999    No
68084329    2000    No
68084329    2001    No
68084329    2002    No
68084329    2003    No
68084329    2004    No
68084329    2005    No
68084329    2006    No
68084329    2007    No
68084329    2008    No
68084329    2010    No
68084329    2011    No
68084329    2012    Yes

How to I drop duplicate rows according to the above conditions?

1
  • 1
    Please get used to providing sample df as a callable line of code, you could create a dummy df or get it from your original data with df.head(10).to_dict('list') Commented Sep 3, 2020 at 16:38

1 Answer 1

4

I think you can do:

# sort by status so that No comes before Yes
df = df.sort_values('status')

# pick the last row, it will either be Yes or No
df = df.groupby('ID').last()
Sign up to request clarification or add additional context in comments.

2 Comments

beat me to it! Just a warning for @MIMA if there are two rows with Yes for the same ID this will only keep one of them
Thank you both for the input. Luckily there's no rows with Yes for the same ID.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.