Remove a row in Python Dataframe if there are repeating values in different columns

Question

This might be a quite easy problem but I can't deal with it properly and didn't find the exact answer here. So, let's say we have a Python Dataframe as below:

df:

ID a b c d
0 1 3 4 9
1 2 8 8 3
2 1 3 10 12
3 0 1 3 0

I want to remove all the rows that contain repeating values in different columns. In other words, I am only interested in keeping rows with unique values. Referring to the above example, the desired output should be:

ID a b c d
0 1 3 4 9
2 1 3 10 12

(I didn't change the ID values on purpose to make the comparison easier). Please let me know if you have any ideas. Thanks!

jezrael · Accepted Answer · 2022-10-17 10:54:40Z

1

You can compare length of sets with length of columns names:

lc = len(df.columns)

df1 = df[df.apply(lambda x: len(set(x)) == lc, axis=1)]
print (df1)
    a  b   c   d
ID              
0   1  3   4   9
2   1  3  10  12

Or test by Series.duplicated and Series.any:

df1 = df[~df.apply(lambda x: x.duplicated().any(), axis=1)]

Or DataFrame.nunique:

df1 = df[df.nunique(axis=1).eq(lc)]

Or:

df1 = df[[len(set(x)) == lc for x in df.to_numpy()]]

edited Oct 17, 2022 at 10:54

answered Oct 17, 2022 at 10:49

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Remove a row in Python Dataframe if there are repeating values in different columns

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related