Pandas. Selecting rows with missing values in multiple columns

Question

Suppose we have a dataframe with the columns 'Race', 'Age', 'Name'. I want to create two 2 DF's:
1) Without missing values in columns 'Race' and 'Age'
2) Only with missing values in columns 'Race' and 'Age'

I wrote the following code

first_df = df[df[columns].notnull()]
second_df= df[df[columns].isnull()]

However this code does not work. I solved this problem using this code

first_df= df[df['Race'].isnull() & df['Age'].isnull()]
second_df = df[df['Race'].isnull() & df['Age'].isnull()]

But what if there are 10 columns ? Is there a way to write this code without logical operators, using only columns list ?

jezrael · Accepted Answer · 2020-04-19 07:24:06Z

5

If select multiple columns get boolean DataFrame, then is necessary test if all columns are Trues by DataFrame.all or test if at least one True per rows by DataFrame.any:

first_df = df[df[columns].notnull().all(axis=1)]
second_df= df[df[columns].isnull().all(axis=1)]

You can also use ~ for invert mask:

mask = df[columns].notnull().all(axis=1)
first_df = df[mask]
second_df= df[~mask]

answered Apr 19, 2020 at 7:24

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Rustem Sadykov Over a year ago

What is the difference betwen .all() and .any() ? Btw, thank you very much. Your solution works!

jezrael Over a year ago

@RustemSadykov - all id like chain by & for AND, any is like chain by | for OR.

Amit Chauhan · Accepted Answer · 2020-04-19 07:57:26Z

1

Step 1 : Make a new dataframe having dropped the missing data (NaN, pd.NaT, None) you can filter out incomplete rows. DataFrame.dropna drops all rows containing at least one field with missing data

Assume new df as DF_updated and earlier as DF_Original

Step 2 : Now our solution DF will be difference between two DFs. It can be found by pd.concat([DF_Original,DF_updated]).drop_duplicates(keep=False)

answered Apr 19, 2020 at 7:57

Amit Chauhan

212 bronze badges

Collectives™ on Stack Overflow

Pandas. Selecting rows with missing values in multiple columns

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related