2

I have this kind of dataFrame which I would like to split into seperate dataframes:

A B C Mark
3 5 6 T
4 5 2 T
3 4 5 B
5 6 7 B
3 4 5 T
2 5 2 T

For instance the table above should be split into three pandas dataframes. First dataframe the two rows with Mark "T" as one dataframe, the second dataframe the next two rows with Mark "B" and the third dataframe the last two rows with Mark "T".

df1

A B C Mark
3 5 6 T
4 5 2 T

df2

 A B C Mark
   3 4 5 B
   5 6 7 B

df3

A B C Mark
3 4 5 T
2 5 2 T

3 Answers 3

1

Create a dictionary as below:

frames = {}
for i, grp in df.groupby(df.Mark.ne(df.Mark.shift()).cumsum()):
    frames.update([('df_'+str(i),grp)])

{'df_1':    A  B  C Mark
 0  3  5  6    T
 1  4  5  2    T, 'df_2':    A  B  C Mark
 2  3  4  5    B
 3  5  6  7    B, 'df_3':    A  B  C Mark
 4  3  4  5    T
 5  2  5  2    T}

You can then test by printing all the dfs as :

print(frames['df_1'])

   A  B  C Mark
0  3  5  6    T
1  4  5  2    T
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! I liked your answer. Here I can name each dictionary differently etc. 'df_T_to_B'. Actually that was what I wanted as well.
Pleasure @Abbos :)
1

Create dictionary of DataFrames with consecutive counter by shift and cumsum with convert groupby object to tuples and then to dictionary:

dfs = dict(tuple(df.groupby(df['Mark'].ne(df['Mark'].shift()).cumsum())))
print (dfs)
{1:    A  B  C Mark
0  3  5  6    T
1  4  5  2    T, 2:    A  B  C Mark
2  3  4  5    B
3  5  6  7    B, 3:    A  B  C Mark
4  3  4  5    T
5  2  5  2    T}

Select each DataFrame:

print (dfs[1])
print (dfs[2])
print (dfs[3])

2 Comments

Thank you for your answer. It is really nice and compact script. And it is exactly what I wanted, however there are some advanced stuff which makes it a bit hard to understand.
@Abbos - Ok, only ask me, I try explain more.
0

Another way around wrapping this into np.array_split for the given Post: However, np.array_split returns a list of DataFrames hence you can list then down and even loop through the list.

Outcome:

>>> np.array_split(df, 3)
[   A  B  C Mark
0  3  5  6    T
1  4  5  2    T,    A  B  C Mark
2  3  4  5    B
3  5  6  7    B,    A  B  C Mark
4  3  4  5    T
5  2  5  2    T]

Listing them as an individual dfs:

>>> df[0]
   A  B  C Mark
0  3  5  6    T
1  4  5  2    T

>>> df[1]
   A  B  C Mark
2  3  4  5    B
3  5  6  7    B

>>> df[2]
   A  B  C Mark
4  3  4  5    T
5  2  5  2    T

Or you can assign them names:

df1 = df[0]
df2 = df[1]
df2 = df[2]

1 Comment

Thank you for your answer. Your suggested script works well in this particular case as it is in example. What if I would have hundreds of rows and don't know exactly how often mark changes.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.