0

This might be a quite easy problem but I can't deal with it properly and didn't find the exact answer here. So, let's say we have a Python Dataframe as below:

df:

ID a b c d
0 1 3 4 9
1 2 8 8 3
2 1 3 10 12
3 0 1 3 0

I want to remove all the rows that contain repeating values in different columns. In other words, I am only interested in keeping rows with unique values. Referring to the above example, the desired output should be:

ID a b c d
0 1 3 4 9
2 1 3 10 12

(I didn't change the ID values on purpose to make the comparison easier). Please let me know if you have any ideas. Thanks!

1 Answer 1

1

You can compare length of sets with length of columns names:

lc = len(df.columns)

df1 = df[df.apply(lambda x: len(set(x)) == lc, axis=1)]
print (df1)
    a  b   c   d
ID              
0   1  3   4   9
2   1  3  10  12

Or test by Series.duplicated and Series.any:

df1 = df[~df.apply(lambda x: x.duplicated().any(), axis=1)]

Or DataFrame.nunique:

df1 = df[df.nunique(axis=1).eq(lc)]

Or:

df1 = df[[len(set(x)) == lc for x in df.to_numpy()]]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.