Checking the duplicate values of multiple columns in a row in a dataframe.

Question

I wanted to check if a dataframe has multiple duplicate values in a row. For instance for this dataset, I wanted to check the number of entries that have duplicates of 'STUDY_ID' and 'VISITCODE'. I tried to implement it like this but got a syntax error, I dont know why.

bp[(bp.duplicated('STUDY_ID') == True) && (bp.duplicated('VISITCODE') == True)]

Isnt it possible to implement what I want in this way? If so, what would be a better way?

jezrael · Accepted Answer · 2018-02-14 12:55:48Z

3

You can change && to & for bitwise and and omit == True:

bp[(bp.duplicated('STUDY_ID') & bp.duplicated('VISITCODE')]

For check duplicates in multiple columns:

bp[bp.duplicated(['STUDY_ID', 'VISITCODE'])]

edited Feb 14, 2018 at 12:55

answered Feb 14, 2018 at 12:41

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Huzo Over a year ago

wow that worked. But why doesnt && work? Does pytthon not support that?

jezrael Over a year ago

no, python support and for scalars and & for arrays for logic AND.

Huzo Over a year ago

I see, so && is never used? Or only in some situations? Because && lights up green in the compiler

jezrael Over a year ago

Hard question, I have no idea why && lights up green in the compiler.

jezrael Over a year ago

Do you need pb[bp.duplicated(['STUDY_ID', 'VISITCODE'])] ?

|

Collectives™ on Stack Overflow

Checking the duplicate values of multiple columns in a row in a dataframe.

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related