13

I have a pandas dataframe as follows:

        a       b       c
    0   1.0     NaN     NaN
    1   NaN     7.0     5.0
    2   3.0     8.0     3.0
    3   4.0     9.0     2.0
    4   5.0     0.0     NaN

Is there a simple way to split the dataframe into multiple dataframes based on non-null values?

        a   
    0   1.0     

         b      c
    1    7.0    5.0

        a       b       c
    2   3.0     8.0     3.0
    3   4.0     9.0     2.0

        a       b      
    4   5.0     0.0

2 Answers 2

17

Using groupby with dropna

for _, x in df.groupby(df.isnull().dot(df.columns)):
      print(x.dropna(1))

     a    b    c
2  3.0  8.0  3.0
3  4.0  9.0  2.0
     b    c
1  7.0  5.0
     a
0  1.0
     a    b
4  5.0  0.0

We can save them in dict

d = {y : x.dropna(1) for y, x in df.groupby(df.isnull().dot(df.columns))}

More Info using the dot to get the null column , if they are same we should combine them together

df.isnull().dot(df.columns)
Out[1250]: 
0    bc
1     a
2      
3      
4     c
dtype: object
Sign up to request clarification or add additional context in comments.

4 Comments

Brilliant use of the dot result for the subsequent groupby. I'm still wrapping my mind around what you did in order to more appropriately sing your praises (-:
Thanks! This is exactly what I was looking for.
@Yen Yw : -) happy coding
@Beny Is there a way to order by the sub dataframes with more number of columns first?
2

So here is a possible solution

def getMap(some_list):
    return "".join(["1" if np.isnan(x) else "0" for x in some_list])

import pandas as pd
import numpy as np

df = pd.DataFrame([[1, np.NaN, np.NaN], [np.NaN, 7, 5], [3, 8, 3], [4, 9, 2], [5, 0, np.NaN]])
print(df.head())

x = df[[0, 1, 2]].apply(lambda x: x.tolist(), axis=1).tolist()

nullMap = [getMap(y) for y in x]
nullSet = set(nullMap)
some_dict = {y:[] for y in nullSet}

for y in x:
    some_dict[getMap(y)] = [*some_dict[getMap(y)], [z for z in y if ~np.isnan(z)]]

dfs = [pd.DataFrame(y) for y in some_dict.values()]
for df in dfs:
    print(df)

This gives the exact output for the input you gave. :)

    a   
    1.0     

     b      c
     7.0    5.0

    a       b       c
    3.0     8.0     3.0
    4.0     9.0     2.0

    a       b      
    5.0     0.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.