Break a dataframe into multiple dataframe based on a repititve column values for all columns

Question

I am new to Pandas. I have a dataframe and would like to split df -

The output should look like this -

df1 -

df2 -

Basically the dataframe must be split where my entire row values are 0. A for loop approach would be appreciated as my dataframe has many rows and many rows with 0 values Any help would be really appreciated.

What are the conditions to split the dfs?

Luis Alejandro Vargas Ramos
– Luis Alejandro Vargas Ramos

2022-08-16 03:02:34 +00:00
Commented Aug 16, 2022 at 3:02 — Luis Alejandro Vargas Ramos
– Luis Alejandro Vargas Ramos, Commented Aug 16, 2022 at 3:02

mozway · Accepted Answer · 2022-09-22 11:53:13Z

1

You can check if all values in a row are 0, and use this to construct a custom group for splitting:

out = [g for _,g in df.groupby(df.eq(0).all(axis=1).cumsum())]

output (list of DataFrames):

[   Time  Temperature
 0     0            0
 1     1           15
 2     2           14,
    Time  Temperature
 3     0            0
 4     1           27]

intermediates:

   Time  Temperature  .eq(0).all(axis=1)  cumsum
0     0            0           True       1
1     1           15          False       1
2     2           14          False       1
3     0            0           True       2
4     1           27          False       2

edited Sep 22, 2022 at 11:53

answered Aug 16, 2022 at 6:23

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mozway Over a year ago

out[0], out[1]...

dat-t-le · Accepted Answer · 2022-08-16 03:11:35Z

You wouldn't be able to split your original dataframe into your examples for df1, df2, or df3, but that's because with the examples you've given, each of those outputs have at least one row that didn't exist in the original. If you're trying to perform some calculations in addition to the splitting, it's not clear to me.

I'm guessing you're referring to indexing and selecting data (the Pandas documentation lists a lot of options). I'll assume you want all the columns and just want to filter out certain rows. Given your original dataframe:

>>> import pandas as pd
>>> df = pd.DataFrame({'Time': [0, 1, 2, 0, 0, 1], 'Temperature':[10, 15, 14, 15, 17, 18]})

   Time  Temperature
0     0           10
1     1           15
2     2           14
3     0           15
4     0           17
5     1           18

You could use iloc() for index-based slicing:

>>> df.iloc[0:2]

   Time  Temperature
0     0           10
1     1           15

You could use loc(), which is different from iloc() since it allows for label-based slicing:

>>> df.loc[0:2, ['Time', 'Temperature']]

   Time  Temperature
0     0           10
1     1           15
2     2           14

If your indexes were string labels, then loc works great there as well...here's an example from the Pandas documentation:

>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
     index=['cobra', 'viper', 'sidewinder'],
     columns=['max_speed', 'shield'])

>>> df
            max_speed  shield
cobra               1       2
viper               4       5
sidewinder          7       8

>>> df.loc[['viper', 'sidewinder']]
            max_speed  shield
viper               4       5
sidewinder          7       8

And you could use boolean indexing. Using a Python comparison operator, you'd get a Pandas series of type boolean. You can pass that into a dataframe with square brackets and it'll only return whatever is True:

>>> df['Time'] == 0

0     True
1    False
2    False
3     True
4     True
5    False
Name: Time, dtype: bool

>>> df[df['Time'] == 0]

   Time  Temperature
0     0           10
3     0           15
4     0           17

Anyways, there's a lot of different options and here were just a few. Definitely check out the documentation and see what would work best for your application.

Based on your updated question, you could instead loop through the rows of your dataframe with a combo of enumerate() and iterrows() and make note of the index where all row values are 0. Then you could use pandas iloc to make slices of the dataframe accordingly.

Collectives™ on Stack Overflow

Break a dataframe into multiple dataframe based on a repititve column values for all columns

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related