Dropping duplicates based on other column values (Python)

Question

I have a dataframe with 3 columns. I would like to drop duplicates in column A based on values in other columns. I have searched tirelessly and cant find a solution like this.

example:

A	B	C
Family1	nan	nan
Family1	nan	1234
Family1	1245	nan
Family1	3456	78787
Family2	nan	nan
Family3	nan	nan

Basically i want to drop a duplicate ONLY IF the rest of the columns are both nan. otherwise, the duplicate can stay.

desired output:

A	B	C
Family1	nan	1234
Family1	1245	nan
Family1	3456	78787
Family2	nan	nan
Family3	nan	nan

Family2 and Family3 remain in the df because they dont have duplicates, even though both columns are nan

can you include the code that creates of a dataframe of the source table? — Paul H
– Paul H, Commented Jan 21, 2021 at 23:05
df = pd.DataFrame({'A':['Family1','Family1','Family1','Family1','Family2','Family3'],'B':[np.nan,np.nan,1245,3456,np.nan,np.nan],'C':[1234,np.nan,78787,np.nan,np.nan,np.nan]}) — brunoff
– brunoff, Commented Jan 21, 2021 at 23:09

wwnde · Accepted Answer · 2021-01-21 23:07:01Z

3

You were not very clear. I suspect you want to drop any duplicates in column A if both columns B and C are NaN. If so, please try;

df[~(df.A.duplicated(keep=False)&(df.B.isna()&df.C.isna()))]

answered Jan 21, 2021 at 23:07

wwnde

26.7k6 gold badges22 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Umar.H · Accepted Answer · 2021-01-21 23:11:13Z

3

try a double boolean, this returns true for all duplicates & true for any column after ['A'] that are all nulls. If both conditions are met we will exclude this using the ~ operator which inverts a boolean.

df[~(df.duplicated(subset=['A'],keep=False) & df.iloc[:,1:].isna().all(1))]

          A     B        C
1  Family1    NaN     1234
2  Family1   1245      NaN
3  Family1   3456    78787
4  Family2    NaN      NaN
5  Family3    NaN      NaN

answered Jan 21, 2021 at 23:11

Umar.H

23.1k7 gold badges50 silver badges94 bronze badges

Collectives™ on Stack Overflow

Dropping duplicates based on other column values (Python)

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related