6

Suppose we have a dataframe with the columns 'Race', 'Age', 'Name'. I want to create two 2 DF's:
1) Without missing values in columns 'Race' and 'Age'
2) Only with missing values in columns 'Race' and 'Age'

I wrote the following code

first_df = df[df[columns].notnull()]
second_df= df[df[columns].isnull()]

However this code does not work. I solved this problem using this code

first_df= df[df['Race'].isnull() & df['Age'].isnull()]
second_df = df[df['Race'].isnull() & df['Age'].isnull()]

But what if there are 10 columns ? Is there a way to write this code without logical operators, using only columns list ?

2 Answers 2

5

If select multiple columns get boolean DataFrame, then is necessary test if all columns are Trues by DataFrame.all or test if at least one True per rows by DataFrame.any:

first_df = df[df[columns].notnull().all(axis=1)]
second_df= df[df[columns].isnull().all(axis=1)]

You can also use ~ for invert mask:

mask = df[columns].notnull().all(axis=1)
first_df = df[mask]
second_df= df[~mask]
Sign up to request clarification or add additional context in comments.

2 Comments

What is the difference betwen .all() and .any() ? Btw, thank you very much. Your solution works!
@RustemSadykov - all id like chain by & for AND, any is like chain by | for OR.
1

Step 1 : Make a new dataframe having dropped the missing data (NaN, pd.NaT, None) you can filter out incomplete rows. DataFrame.dropna drops all rows containing at least one field with missing data

Assume new df as DF_updated and earlier as DF_Original

Step 2 : Now our solution DF will be difference between two DFs. It can be found by pd.concat([DF_Original,DF_updated]).drop_duplicates(keep=False)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.