Splitting Pandas dataframe into multiple dataframes based on condition in column

Works like a charm, but how do I access each of the resulting dataframes?

@Ash What do you mean by "access"? The function returns a list that contains all the data frames so you can access that list. Note that the indices are retained within each of the sub-data frames.

np.split(df, np.where(df.BOOL == 1)[0] + 1) dose not work also you split the dataframe to 3 , I think he need 0 to n (n is BOOL ==1 index )

@Wen-Ben Why not? It does work for the given example and it won't raise an error even if the indices run out of range; in that case you just get empty data frames.

@a_guest I think in his expected output he need two dataframe(0-1 and 0-3) , and you return 3 , each of the length is 2,2,1 am I right ?

|

BENY · Accepted Answer · 2019-02-03 01:40:45Z

3

I think using for loop is better here

idx=df.BOOL.nonzero()[0]

d={x : df.iloc[:y+1,:] for x , y in enumerate(idx)}
d[0]
   BOOL USER_ID  VALUE
0     0     001      1
1     1     001      2

edited Feb 3, 2019 at 1:40

answered Feb 3, 2019 at 0:53

BENY

324k22 gold badges176 silver badges250 bronze badges

5 Comments

Really good approach - which works on the sample dataset. But for some cryptic reason does not work on my actual dataframe. It returns n dataframes - all of the original size.

@Ash anyway , I just follow your expected output(above two pics)

@Wen-Ben You mix index and iloc that's probably the reason why it doesn't work for the other data frame (in case the indices there are not a simple enumeration).

@Wen-Ben But now for non-numeric indices the +1 will fail. So you should probably stick to iloc and use the positions of the index.

@a_guest check nonzero

U13-Forward · Accepted Answer · 2019-02-03 00:58:30Z

2

Why not list comprehension? like:

>>> l=[df.iloc[:i+1] for i in df.index[df['BOOL']==1]]
>>> l[0]
   BOOL USER_ID  VALUE
0     0     001      1
1     1     001      2
>>> l[1]
   BOOL USER_ID  VALUE
0     0     001      1
1     1     001      2
2     0     001      3
3     1     001      4
>>>

answered Feb 3, 2019 at 0:58

U13-Forward

71.8k15 gold badges100 silver badges125 bronze badges

4 Comments

Simplifies @Wen-Ben's approach to 1 line - but I still have the same issue. Works on the sample dataset. But not on my actual dataframe. This returns n dataframes - all of the original size.

You mix index and iloc that's probably the reason why it doesn't work for the other data frame (in case the indices there are not a simple enumeration).

@a_guest Sorry mate, can you explain what you mean? Not quite sure I understand what you mean by mix index and iloc?