Split pandas dataframe into multiple dataframes based on null columns

Question

I have a pandas dataframe as follows:

        a       b       c
    0   1.0     NaN     NaN
    1   NaN     7.0     5.0
    2   3.0     8.0     3.0
    3   4.0     9.0     2.0
    4   5.0     0.0     NaN

Is there a simple way to split the dataframe into multiple dataframes based on non-null values?

        a   
    0   1.0     

         b      c
    1    7.0    5.0

        a       b       c
    2   3.0     8.0     3.0
    3   4.0     9.0     2.0

        a       b      
    4   5.0     0.0

BENY · Accepted Answer · 2018-09-25 16:30:25Z

17

Using groupby with dropna

for _, x in df.groupby(df.isnull().dot(df.columns)):
      print(x.dropna(1))

     a    b    c
2  3.0  8.0  3.0
3  4.0  9.0  2.0
     b    c
1  7.0  5.0
     a
0  1.0
     a    b
4  5.0  0.0

We can save them in dict

d = {y : x.dropna(1) for y, x in df.groupby(df.isnull().dot(df.columns))}

More Info using the dot to get the null column , if they are same we should combine them together

df.isnull().dot(df.columns)
Out[1250]: 
0    bc
1     a
2      
3      
4     c
dtype: object

edited Sep 25, 2018 at 16:30

answered Sep 25, 2018 at 15:50

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

piRSquared Over a year ago

Brilliant use of the dot result for the subsequent groupby. I'm still wrapping my mind around what you did in order to more appropriately sing your praises (-:

Yen Over a year ago

Thanks! This is exactly what I was looking for.

BENY Over a year ago

@Yen Yw : -) happy coding

DEs Over a year ago

@Beny Is there a way to order by the sub dataframes with more number of columns first?

Traxes · Accepted Answer · 2018-09-25 16:24:38Z

So here is a possible solution

def getMap(some_list):
    return "".join(["1" if np.isnan(x) else "0" for x in some_list])

import pandas as pd
import numpy as np

df = pd.DataFrame([[1, np.NaN, np.NaN], [np.NaN, 7, 5], [3, 8, 3], [4, 9, 2], [5, 0, np.NaN]])
print(df.head())

x = df[[0, 1, 2]].apply(lambda x: x.tolist(), axis=1).tolist()

nullMap = [getMap(y) for y in x]
nullSet = set(nullMap)
some_dict = {y:[] for y in nullSet}

for y in x:
    some_dict[getMap(y)] = [*some_dict[getMap(y)], [z for z in y if ~np.isnan(z)]]

dfs = [pd.DataFrame(y) for y in some_dict.values()]
for df in dfs:
    print(df)

This gives the exact output for the input you gave. :)

    a   
    1.0     

     b      c
     7.0    5.0

    a       b       c
    3.0     8.0     3.0
    4.0     9.0     2.0

    a       b      
    5.0     0.0

Collectives™ on Stack Overflow

Split pandas dataframe into multiple dataframes based on null columns

2 Answers 2

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related