2

How do i filter a dataframe to only show rows with duplicates across multiple columns?

Example dataframe:

col1 col2 col3
A1    B1   C1
A1    B1   C1
A1    B1   C2
A2    B2   C2

Expected output:

col1 col2 col3
A1    B1   C1
A1    B1   C1

My attempt:

df[df.duplicated(['col1', 'col2', 'col3'], keep=False)]

but this does not give expected outcome.

1
  • seems to work for me Commented Mar 3, 2018 at 3:01

1 Answer 1

7

Your attempt df[df.duplicated(['col1', 'col2', 'col3'], keep=False)] works in my testing. You can leave out the column names:

df[df.duplicated(keep=False)]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.