1

Having a dataframe which contains duplicate values in two columns (A and B):

A B
1 2
2 3
4 5
7 6
5 8

I want to remove duplicates so that only unique values remain:

A B
1 2
4 5
7 6

This command does not provide what I want:

df.drop_duplicates(subset=['A','B'], keep='first')

Any idea how to do this?

1 Answer 1

2

You can use stack with unstack:

print (df.stack().drop_duplicates().unstack().dropna().astype(int))
   A  B
0  1  2
2  4  5
3  7  6

Solution with boolean indexing:

print (df[~df.stack().duplicated().unstack().any(1)])
   A  B
0  1  2
2  4  5
3  7  6
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! It works but if I want to do it only for particular columns it does not accept it. Like such a command doesnt work: df.stack().drop_duplicates(subset=['A', 'C'], keep=False).unstack().dropna()
You need use subset of data - the simpliest is second solution print (df[~df[['A','C']].stack().duplicated().unstack().any(1)])

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.