How to remove rows with duplicates in pandas dataframe?

Question

Having a dataframe which contains duplicate values in two columns (A and B):

I want to remove duplicates so that only unique values remain:

This command does not provide what I want:

df.drop_duplicates(subset=['A','B'], keep='first')

Any idea how to do this?

jezrael · Accepted Answer · 2016-09-27 14:20:10Z

2

You can use stack with unstack:

print (df.stack().drop_duplicates().unstack().dropna().astype(int))
   A  B
0  1  2
2  4  5
3  7  6

Solution with boolean indexing:

print (df[~df.stack().duplicated().unstack().any(1)])
   A  B
0  1  2
2  4  5
3  7  6

answered Sep 27, 2016 at 14:20

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Joe Over a year ago

Thanks! It works but if I want to do it only for particular columns it does not accept it. Like such a command doesnt work: df.stack().drop_duplicates(subset=['A', 'C'], keep=False).unstack().dropna()

jezrael Over a year ago

You need use subset of data - the simpliest is second solution print (df[~df[['A','C']].stack().duplicated().unstack().any(1)])

Collectives™ on Stack Overflow

How to remove rows with duplicates in pandas dataframe?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related