Can split pandas dataframe based on row values?

Question

I have a pandas dataframe that effectively contains several different datasets. Between each dataset is a row full of NaN. Can I split the dataframe on the NaN row to make two dataframes? Thanks in advance.

@PaulH typo? isnull(). I think OP also meant all instead of any. — Ehsan
– Ehsan, Commented Jul 13, 2020 at 4:30
@Ehsan yeah isnull() from the OPs description it sounds like either might work, but all would be the safer bet — Paul H
– Paul H, Commented Jul 13, 2020 at 4:35

Ehsan · Accepted Answer · 2020-07-13 04:28:01Z

2

You can use this to split into many data frames based on all NaN rows:

#index of all NaN rows (+ beginning and end of df)
idx = [0] + df.index[df.isnull().all(1)].tolist() + [df.shape[0]]
#list of data frames split at all NaN indices
list_of_dfs = [df.iloc[idx[n]:idx[n+1]] for n in range(len(idx)-1)]

And if you want to exclude the NaN rows from split data frames:

idx = [-1] + df.index[df.isnull().all(1)].tolist() + [df.shape[0]]
list_of_dfs = [df.iloc[idx[n]+1:idx[n+1]] for n in range(len(idx)-1)]

Example:

df:

     0    1
0  1.0  1.0
1  NaN  1.0
2  1.0  NaN
3  NaN  NaN
4  NaN  NaN
5  1.0  1.0
6  1.0  1.0
7  NaN  1.0
8  1.0  NaN
9  1.0  NaN

list_of_dfs:

[     0    1
0  1.0  1.0
1  NaN  1.0
2  1.0  NaN, 

Empty DataFrame
Columns: [0, 1]
Index: [],   

     0    1
5  1.0  1.0
6  1.0  1.0
7  NaN  1.0
8  1.0  NaN
9  1.0  NaN]

edited Jul 13, 2020 at 4:28

answered Jul 13, 2020 at 4:21

Ehsan

12.5k2 gold badges24 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Varsha Kishore · Accepted Answer · 2020-07-13 04:12:05Z

0

Use df[df[COLUMN_NAME].isnull()].index.tolist() to get a list of indices corresponding to the NaN rows. You can then split the dataframe into multiple dataframes by using the indices.

answered Jul 13, 2020 at 4:12

Varsha Kishore

1815 bronze badges

Comments

Valdi_Bo · Accepted Answer · 2020-07-13 04:56:21Z

My solution allows to split your DataFrame into any number of chunks, on each row full of NaNs.

Assume that the input DataFrame contains:

       A    B     C
0   10.0  Abc  20.0
1   11.0  NaN  21.0
2   12.0  Ghi   NaN
3    NaN  NaN   NaN
4    NaN  Hkx  30.0
5   21.0  Jkl  32.0
6   22.0  Mno  33.0
7    NaN  NaN   NaN
8   30.0  Pqr  40.0
9    NaN  Stu   NaN
10  32.0  Vwx  44.0

so that "split points" are rows with indices 3 and 7.

To do your task:

Generate the grouping criterion Series:

 grp = (df.isnull().sum(axis=1) == df.shape[1]).cumsum()

Drop rows full of NaN and group the result by the above criterion:
```
 gr = df.dropna(axis=0, thresh=1).groupby(grp)
```
thresh=1 means that for the current row it is enough to have 1 non-NaN value to be kept in the result.

Perform actual split, as a list comprehension:

 result = [ gr.get_group(key) for key in gr.groups ]

To print the result, you can run:

for i, chunk in enumerate(result):
    print(f'Chunk {i}:')
    print(chunk, end='\n\n')

getting:

Chunk 0:
      A    B     C
0  10.0  Abc  20.0
1  11.0  NaN  21.0
2  12.0  Ghi   NaN

Chunk 1:
      A    B     C
4   NaN  Hkx  30.0
5  21.0  Jkl  32.0
6  22.0  Mno  33.0

Chunk 2:
       A    B     C
8   30.0  Pqr  40.0
9    NaN  Stu   NaN
10  32.0  Vwx  44.0

Collectives™ on Stack Overflow

Can split pandas dataframe based on row values?

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related