3

I have a dataframe in which the columns are supposed to be dummy columns (for each row only one column should be populated). However, the data has some 'noise' in it: some rows have more than one column populated. I want to drop these rows.

Suppose the DataFrame looks like the below example:

  a       b        c        d  
0 NaN     1        NaN      NaN
1 1       2        3        4  
2 1       1        NaN      NaN 
3 NaN     NaN      1        NaN
4 1       NaN      1        NaN

So my expected result is that rows [1,2,4] get dropped. You may say that I only accept rows where the number of NaN values is equal to the number_of_columns - 1.

Is there any way to do this in pandas?

2 Answers 2

3

Use:

df[(df.shape[1]-1)==(df.isna().sum(axis=1))]

    a    b    c   d
0 NaN  1.0  NaN NaN
3 NaN  NaN  1.0 NaN
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the above! And supposing my data has more columns and I want to drop rows based only on the above subset of columns, how would you go about it?
@Maciej if I understand , you can create a copy, m=df.loc[:,[interested_columns]].copy() and then replace the df in the code with m.
1

This one will get you there. You just count the number of nulls in a row and slice your frame based on that.

df[df.notna().sum(axis=1) <= 1]
    a    b    c   d
0 NaN  1.0  NaN NaN
3 NaN  NaN  1.0 NaN

1 Comment

Thanks, this is a perfect solution that also allows you to drop rows based on a subset of column from your dataframe!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.