Python pandas dataframe splitting

Question

I have this kind of dataFrame which I would like to split into seperate dataframes:

A B C Mark
3 5 6 T
4 5 2 T
3 4 5 B
5 6 7 B
3 4 5 T
2 5 2 T

For instance the table above should be split into three pandas dataframes. First dataframe the two rows with Mark "T" as one dataframe, the second dataframe the next two rows with Mark "B" and the third dataframe the last two rows with Mark "T".

df1

A B C Mark
3 5 6 T
4 5 2 T

df2

 A B C Mark
   3 4 5 B
   5 6 7 B

df3

A B C Mark
3 4 5 T
2 5 2 T

anky · Accepted Answer · 2019-02-14 11:47:18Z

1

Create a dictionary as below:

frames = {}
for i, grp in df.groupby(df.Mark.ne(df.Mark.shift()).cumsum()):
    frames.update([('df_'+str(i),grp)])

{'df_1':    A  B  C Mark
 0  3  5  6    T
 1  4  5  2    T, 'df_2':    A  B  C Mark
 2  3  4  5    B
 3  5  6  7    B, 'df_3':    A  B  C Mark
 4  3  4  5    T
 5  2  5  2    T}

You can then test by printing all the dfs as :

print(frames['df_1'])

   A  B  C Mark
0  3  5  6    T
1  4  5  2    T

answered Feb 14, 2019 at 11:47

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Abbos Over a year ago

Thank you! I liked your answer. Here I can name each dictionary differently etc. 'df_T_to_B'. Actually that was what I wanted as well.

anky Over a year ago

Pleasure @Abbos :)

jezrael · Accepted Answer · 2019-02-14 11:47:06Z

1

Create dictionary of DataFrames with consecutive counter by shift and cumsum with convert groupby object to tuples and then to dictionary:

dfs = dict(tuple(df.groupby(df['Mark'].ne(df['Mark'].shift()).cumsum())))
print (dfs)
{1:    A  B  C Mark
0  3  5  6    T
1  4  5  2    T, 2:    A  B  C Mark
2  3  4  5    B
3  5  6  7    B, 3:    A  B  C Mark
4  3  4  5    T
5  2  5  2    T}

Select each DataFrame:

print (dfs[1])
print (dfs[2])
print (dfs[3])

answered Feb 14, 2019 at 11:47

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

2 Comments

Abbos Over a year ago

Thank you for your answer. It is really nice and compact script. And it is exactly what I wanted, however there are some advanced stuff which makes it a bit hard to understand.

jezrael Over a year ago

@Abbos - Ok, only ask me, I try explain more.

Karn Kumar · Accepted Answer · 2019-02-14 13:30:35Z

0

Another way around wrapping this into np.array_split for the given Post: However, np.array_split returns a list of DataFrames hence you can list then down and even loop through the list.

Outcome:

>>> np.array_split(df, 3)
[   A  B  C Mark
0  3  5  6    T
1  4  5  2    T,    A  B  C Mark
2  3  4  5    B
3  5  6  7    B,    A  B  C Mark
4  3  4  5    T
5  2  5  2    T]

Listing them as an individual dfs:

>>> df[0]
   A  B  C Mark
0  3  5  6    T
1  4  5  2    T

>>> df[1]
   A  B  C Mark
2  3  4  5    B
3  5  6  7    B

>>> df[2]
   A  B  C Mark
4  3  4  5    T
5  2  5  2    T

Or you can assign them names:

df1 = df[0]
df2 = df[1]
df2 = df[2]

edited Feb 14, 2019 at 13:30

answered Feb 14, 2019 at 13:25

Karn Kumar

8,8343 gold badges32 silver badges61 bronze badges

1 Comment

Abbos Over a year ago

Thank you for your answer. Your suggested script works well in this particular case as it is in example. What if I would have hundreds of rows and don't know exactly how often mark changes.

Collectives™ on Stack Overflow

Python pandas dataframe splitting

3 Answers 3

2 Comments

2 Comments

Outcome:

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

Outcome:

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related