Pandas split one dataframe into multiple dataframes

Question

I have one pandas dataframe that I need to split into multiple dataframes. The number of dataframes I need to split depends on how many months of data I have i.e I need to create a new dataframe for every month. So df:

MONTH   NAME INCOME
201801   A     100$
201801   B      20$
201802   A      30$

So I need to create 2 dataframes . Problem is i dont know how many months of data I will have in advance. How do i do that

Vaishali · Accepted Answer · 2019-01-04 22:05:14Z

8

You can use groupby to create a dictionary of data frames,

df['MONTH'] = pd.to_datetime(df['MONTH'], format = '%Y%m')
dfs = dict(tuple(df.groupby(df['MONTH'].dt.month)))
dfs[1]


    MONTH   NAME    INCOME
0   2018-01-01  A   100$
1   2018-01-01  B   20$

If your data is across multiple years, you will need to include year in the grouping

dfs = dict(tuple(df.groupby([df['MONTH'].dt.year,df['MONTH'].dt.month])))
dfs[(2018, 1)]

    MONTH      NAME INCOME
0   2018-01-01  A   100$
1   2018-01-01  B   20$

answered Jan 4, 2019 at 22:05

Vaishali

38.5k5 gold badges62 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Scott Boston Over a year ago

I like your dict(tuple(groupby...). +1 I'm adding that to my toolbox.

Victor Over a year ago

Thank you. Despite specifying the format = '%Y%m', why did MONTH change from 201801 to 2018-01-01?

Vaishali Over a year ago

@Victor, Pandas creates date in Ymd format, if date component is missing, it will add 01 as the date.

Scott Boston · Accepted Answer · 2020-04-26 19:33:47Z

3

You can use groupby to split dataframes in to list of dataframes or a dictionary of datframes:

Dictionary of dataframes:

dict_of_dfs = {}
for n, g in df.groupby(df['MONTH']):
    dict_of_dfs[n] = g

List of dataframes:

list_of_dfs = []
for _, g in df.groupby(df['MONTH']):
    list_of_dfs.append(g)

Or as @BenMares suggests use comprehension:

dict_of_dfs = {

    month: group_df 

    for month, group_df in df.groupby('MONTH') 

}


list_of_dfs = [

    group_df 

    for _, group_df in df.groupby('MONTH')

]

edited Apr 26, 2020 at 19:33

answered Jan 4, 2019 at 22:13

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

2 Comments

Ben Mares Over a year ago

It would be much more elegant to use a comprehension! {index: group_df for index, group_df in df.groupby('MONTH')}

Scott Boston Over a year ago

Agreed dictionary comprehension. Nice, @BenMares.

cors · Accepted Answer · 2019-01-04 22:26:50Z

2

You can also use local variable dictionary vars() in this way:

for m in df['MONTH'].unique():
    temp = 'df_{}'.format(m)    
    vars()[temp] = df[df['MONTH']==m]

each DataFrame is created as under name df_month. For example:

df_201801

    MONTH   NAME    INCOME
0   201801  A   100$
1   201801  B   20$

answered Jan 4, 2019 at 22:26

cors

5374 silver badges11 bronze badges

Collectives™ on Stack Overflow

Pandas split one dataframe into multiple dataframes

3 Answers 3

3 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related