0

Having the following dataframe:

name aaa bbb
Mick None None
Ivan A C
Ivan-Peter 1 None
Juli 1 P

I want to get two dataframes.

  • One with values, where we have None in columns aaa and/or bbb, named filter_nulls in my code
  • One where we do not have None at all. df_out in my code.

This is what I have tried and it does not produce the required dataframes.

import pandas as pd

df_out = {
    'name': [ 'Mick', 'Ivan', 'Ivan-Peter', 'Juli'],
    'aaa': [None, 'A', '1', '1'],
    'bbb': [None, 'C', None, 'P'],
}
print(df_out)

filter_nulls = df_out[df_out['aaa'].isnull()|(df_out['bbb'] is None)]
print(filter_nulls)

df_out = df_out.loc[filter_nulls].reset_index(level=0, drop=True)
print(df_out)

2 Answers 2

1

Use:

#DataFrame from sample data
df_out = pd.DataFrame(df_out)

#filter columns names by list and test if NaN or None at least in one row
m = df_out[['aaa','bbb']].isna().any(axis=1)

#OR test both columns separately
m = df_out['aaa'].isna() | df_out['bbb'].isna()


#filter matched and not matched rows
df1 = df_out[m].reset_index(drop=True)
df2 = df_out[~m].reset_index(drop=True)
print (df1)
         name   aaa   bbb
0        Mick  None  None
1  Ivan-Peter     1  None

print (df2)
   name aaa bbb
0  Ivan   A   C
1  Juli   1   P

Another idea with DataFrame.dropna and filter indices not exist in df2:

df2 = df_out.dropna()
df1 = df_out.loc[df_out.index.difference(df2.index)].reset_index(drop=True)
df2 = df2.reset_index(drop=True)
Sign up to request clarification or add additional context in comments.

5 Comments

Tried with this solution m = df_out[['aaa','bbb']].isna().any(axis=1), got an error on that line saying TypeError: unhashable type: 'list'. Still researching.
@Vityata - missing df_out = pd.DataFrame(df_out) in first line of code
yup, that is it... And I though that I have changed my input data too much.
Just a question on execution - if the DF is about 1M rows, is this df2 = df_out[~m].reset_index(drop=True) going to be a bit slow?
@Vityata - I think not, it is better like again call df_out['aaa'].notnull() & df_out['bbb'].notnull() vs ~m
1

First of all one needs to convert df_out to a dataframe with pandas.DataFrame as follows

df_out = pd.DataFrame(df_out)

[Out]:

         name   aaa   bbb
0        Mick  None  None
1        Ivan     A     C
2  Ivan-Peter     1  None
3        Juli     1     P

Then one can use, for both cases, pandas.Series.notnull.

With values, where we have None in columns aaa and/or bbb, named filter_nulls in my code

df1 = df_out[~df_out['aaa'].notnull() | ~df_out['bbb'].notnull()]

[Out]:

         name   aaa   bbb
0        Mick  None  None
2  Ivan-Peter     1  None

Where we do not have None at all. df_out in my code.

df2 = df_out[df_out['aaa'].notnull() & df_out['bbb'].notnull()]

[Out]:

   name aaa bbb
1  Ivan   A   C
3  Juli   1   P

Notes:

  • If needed one can use pandas.DataFrame.reset_index to get the following

    df_new = df_out[~df_out['aaa'].notnull() | ~df_out['bbb'].notnull()].reset_index(drop=True)
    
    [Out]:
    
             name   aaa   bbb
    0        Mick  None  None
    1  Ivan-Peter     1  None
    

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.