0

I have a dataframe where there are 2 columns I want to filter and count total number of "null" values for each column.

Tried this code;

chck_nulls = df['account_id'].isnull().sum() | df['customer_id'].isnull().sum()
print (df[chck_nulls])

Getting this error;

    chck_nulls = df['account_id'].isnull().sum() | df['customer_id'].isnull().sum()
    print (df[chck_nulls])
    1
    chck_nulls = df['account_id'].isnull().sum() | df['customer_id'].isnull().sum()
    2
    print (df[chck_nulls])
    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    /anaconda/envs/azureml_py38/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
       3079             try:
    -> 3080                 return self._engine.get_loc(casted_key)
       3081             except KeyError as err:
KeyError: 28671

Sample Data

Customer Name   account_id  customer_id
Adam            null        null
Michael         null        null
Jenkins         null        null

Expected results;

customer_id        3
account_id         3

Any help would be highly appreciated!

Thanks

7
  • Try "isna()" in place of "isnull()" Commented Nov 23, 2021 at 3:47
  • Or "count_nan = len(df[column]) - df[column].count()" Commented Nov 23, 2021 at 3:49
  • @Wilian tried isna(), same error Commented Nov 23, 2021 at 3:50
  • Didnt work either Commented Nov 23, 2021 at 3:55
  • 1
    my bad, i just posted the solution. try that! i mistakenly pasted your print statement without checking it. your print statement has error because chck_nulls is an integer (the total number of entries in your two mentioned columns with null values) and not a boolean. Commented Nov 23, 2021 at 4:33

1 Answer 1

1
chck_nulls_account_id, chck_nulls_customer_id = (df['account_id'].isnull()).sum(), (df['customer_id'].isnull()).sum()

# print(chck_nulls_account_id, chck_nulls_customer_id)
print(f'customer_id\t{chck_nulls_customer_id}')
print(f'account_id\t{chck_nulls_account_id}')

# The following two print statements will give you the df with entries having only null values in each column
print(df[df['account_id'].isnull()])
print(df[df['customer_id'].isnull()])
# The following print statement will give you the df with entries having only null values in both the columns
print(df[(df['account_id'].isnull()) | df['customer_id'].isnull()])


print(chck_nulls) will give you the correct sum of total null entries in account_id and customer_id columns.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.