92

I want to split the following dataframe based on column ZZ

df = 
        N0_YLDF  ZZ        MAT
    0  6.286333   2  11.669069
    1  6.317000   6  11.669069
    2  6.324889   6  11.516454
    3  6.320667   5  11.516454
    4  6.325556   5  11.516454
    5  6.359000   6  11.516454
    6  6.359000   6  11.516454
    7  6.361111   7  11.516454
    8  6.360778   7  11.516454
    9  6.361111   6  11.516454

As output, I want a new DataFrame with the N0_YLDF column split into 4, one new column for each unique value of ZZ. How do I go about this? I can do groupby, but do not know what to do with the grouped object.

6 Answers 6

183
gb = df.groupby('ZZ')    
[gb.get_group(x) for x in gb.groups]
Sign up to request clarification or add additional context in comments.

2 Comments

Great answer! How do we extract the respective dataframes from gb?
The method get_group(x) returns a new DataFrame object containing only the rows where column ZZ == x
43

There is another alternative as the groupby returns a generator we can simply use a list-comprehension to retrieve the 2nd value (the frame).

dfs = [x for _, x in df.groupby('ZZ')]

2 Comments

would this one liner work if I'm looking to make specific aggregations to every data frame?
This one-liner simply stores the dataframes in an array. What you do next is up to you. Maybe have a look at ALollz answer to access keys.
12

In R there is a dataframe method called split. This is for all the R users out there:

def split(df, group):
     gb = df.groupby(group)
     return [gb.get_group(x) for x in gb.groups]

5 Comments

shouldn't you put it all into a series? ending with pd.Series(...)
This is amazing. Is there an easy way to get the key which identifies of the group, so I can return a list of tuples, like [ (key, gb.get_group(x) ) for x in gb.group]?
I found this, which makes this easy: stackoverflow.com/questions/42513049/…
Just to provide an answer to the comment (which is explained in more detail in the link: [(key, gb.get_group(key)) for key in gb.groups]
The same solution but with iterators def split(df, group): gb = df.groupby(group) for g in gb.groups: yield gb.get_group(g)
9

Store them in a dict, which allows you access to the group DataFrames based on the group keys.

d = dict(tuple(df.groupby('ZZ')))
d[6]

#    N0_YLDF  ZZ        MAT
#1  6.317000   6  11.669069
#2  6.324889   6  11.516454
#5  6.359000   6  11.516454
#6  6.359000   6  11.516454
#9  6.361111   6  11.516454

If you need only a subset of the DataFrame, in this case just the 'NO_YLDF' Series, you can modify the dict comprehension.

d = dict((idx, gp['N0_YLDF']) for idx, gp in df.groupby('ZZ'))
d[6]
#1    6.317000
#2    6.324889
#5    6.359000
#6    6.359000
#9    6.361111
#Name: N0_YLDF, dtype: float64

Comments

0

You can iterate over unique values and get groups using loc or query:

[df.loc[df['ZZ'] == i] for i in df['ZZ'].unique()]

or

[df.query('ZZ == @i') for i in df['ZZ'].unique()]

Comments

0

Adding to user qwwqwwq answer:

gb = df.groupby('ZZ')
df_six = gb.get_group("6") #to create another dataframe with ZZ = 6
df_one = gb.get_group("7") #to create another dataframe with ZZ = 7

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.