Removing None values from DataFrame in Python

Question

Having the following dataframe:

name	aaa	bbb
Mick	None	None
Ivan	A	C
Ivan-Peter	1	None
Juli	1	P

I want to get two dataframes.

One with values, where we have None in columns aaa and/or bbb, named filter_nulls in my code
One where we do not have None at all. df_out in my code.

This is what I have tried and it does not produce the required dataframes.

import pandas as pd

df_out = {
    'name': [ 'Mick', 'Ivan', 'Ivan-Peter', 'Juli'],
    'aaa': [None, 'A', '1', '1'],
    'bbb': [None, 'C', None, 'P'],
}
print(df_out)

filter_nulls = df_out[df_out['aaa'].isnull()|(df_out['bbb'] is None)]
print(filter_nulls)

df_out = df_out.loc[filter_nulls].reset_index(level=0, drop=True)
print(df_out)

jezrael · Accepted Answer · 2022-10-25 10:40:48Z

1

Use:

#DataFrame from sample data
df_out = pd.DataFrame(df_out)

#filter columns names by list and test if NaN or None at least in one row
m = df_out[['aaa','bbb']].isna().any(axis=1)

#OR test both columns separately
m = df_out['aaa'].isna() | df_out['bbb'].isna()


#filter matched and not matched rows
df1 = df_out[m].reset_index(drop=True)
df2 = df_out[~m].reset_index(drop=True)
print (df1)
         name   aaa   bbb
0        Mick  None  None
1  Ivan-Peter     1  None

print (df2)
   name aaa bbb
0  Ivan   A   C
1  Juli   1   P

Another idea with DataFrame.dropna and filter indices not exist in df2:

df2 = df_out.dropna()
df1 = df_out.loc[df_out.index.difference(df2.index)].reset_index(drop=True)
df2 = df2.reset_index(drop=True)

edited Oct 25, 2022 at 10:40

answered Oct 25, 2022 at 10:32

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Vityata Over a year ago

Tried with this solution m = df_out[['aaa','bbb']].isna().any(axis=1), got an error on that line saying TypeError: unhashable type: 'list'. Still researching.

jezrael Over a year ago

@Vityata - missing df_out = pd.DataFrame(df_out) in first line of code

Vityata Over a year ago

yup, that is it... And I though that I have changed my input data too much.

Vityata Over a year ago

Just a question on execution - if the DF is about 1M rows, is this df2 = df_out[~m].reset_index(drop=True) going to be a bit slow?

jezrael Over a year ago

@Vityata - I think not, it is better like again call df_out['aaa'].notnull() & df_out['bbb'].notnull() vs ~m

Gonçalo Peres · Accepted Answer · 2022-10-25 10:53:25Z

First of all one needs to convert df_out to a dataframe with pandas.DataFrame as follows

df_out = pd.DataFrame(df_out)

[Out]:

         name   aaa   bbb
0        Mick  None  None
1        Ivan     A     C
2  Ivan-Peter     1  None
3        Juli     1     P

Then one can use, for both cases, pandas.Series.notnull.

With values, where we have None in columns aaa and/or bbb, named filter_nulls in my code

df1 = df_out[~df_out['aaa'].notnull() | ~df_out['bbb'].notnull()]

[Out]:

         name   aaa   bbb
0        Mick  None  None
2  Ivan-Peter     1  None

Where we do not have None at all. df_out in my code.

df2 = df_out[df_out['aaa'].notnull() & df_out['bbb'].notnull()]

[Out]:

   name aaa bbb
1  Ivan   A   C
3  Juli   1   P

Notes:

If needed one can use pandas.DataFrame.reset_index to get the following

df_new = df_out[~df_out['aaa'].notnull() | ~df_out['bbb'].notnull()].reset_index(drop=True)

[Out]:

         name   aaa   bbb
0        Mick  None  None
1  Ivan-Peter     1  None

Collectives™ on Stack Overflow

Removing None values from DataFrame in Python

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related