1

So, I've got a df like so,

ID,A,B,C,D,E,F,G
1,123,30,3G,1,123,30,3G
2,456,40,4G,NaN,NaN,NaN,4G
3,789,35,5G,NaN,NaN,NaN,NaN

I also have a list that has a subset of the header list of df like so,

header_list = ["D","E","F","G"]

Now I'd like to get those records from df that CONTAINS Null values FOR ALL OF the Column Names in the header_list.

Expected Output:

ID,A,B,C,D,E,F,G
3,789,35,5G,NaN,NaN,NaN,NaN

I tried, new_df = df[df[header_list].isnull()] but this throws error, ValueError: Boolean array expected for the condition, not float64

I know I can do something like this,

new_df = df[(df['D'].isnull()) & (df['E'].isnull()) & (df['F'].isnull()) & (df['G'].isnull())]

But I don't want to hard code it like this. So is there a better way of doing this?

0

1 Answer 1

1

You can filter this with:

df[df[header_list].isnull().all(axis=1)]

We thus check if a row contains values where .all() values are .isnull().

For the given sample input, this gives the expected output:

>>> df[df[header_list].isnull().all(axis=1)]
     A   B   C   D   E   F    G
3  789  35  5G NaN NaN NaN  NaN

The .all(axis=1) [pandas-doc] will thus return True for a row, given all columns for that row are True, and False otherwise. So for the given sample input, we get:

>>> df[header_list]
     D      E     F    G
1  1.0  123.0  30.0   3G
2  NaN    NaN   NaN   4G
3  NaN    NaN   NaN  NaN
>>> df[header_list].isnull()
       D      E      F      G
1  False  False  False  False
2   True   True   True  False
3   True   True   True   True
>>> df[header_list].isnull().all(axis=1)
1    False
2    False
3     True
dtype: bool
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.