5

I have one pandas dataframe that I need to split into multiple dataframes. The number of dataframes I need to split depends on how many months of data I have i.e I need to create a new dataframe for every month. So df:

MONTH   NAME INCOME
201801   A     100$
201801   B      20$
201802   A      30$

So I need to create 2 dataframes . Problem is i dont know how many months of data I will have in advance. How do i do that

3 Answers 3

8

You can use groupby to create a dictionary of data frames,

df['MONTH'] = pd.to_datetime(df['MONTH'], format = '%Y%m')
dfs = dict(tuple(df.groupby(df['MONTH'].dt.month)))
dfs[1]


    MONTH   NAME    INCOME
0   2018-01-01  A   100$
1   2018-01-01  B   20$

If your data is across multiple years, you will need to include year in the grouping

dfs = dict(tuple(df.groupby([df['MONTH'].dt.year,df['MONTH'].dt.month])))
dfs[(2018, 1)]

    MONTH      NAME INCOME
0   2018-01-01  A   100$
1   2018-01-01  B   20$
Sign up to request clarification or add additional context in comments.

3 Comments

I like your dict(tuple(groupby...). +1 I'm adding that to my toolbox.
Thank you. Despite specifying the format = '%Y%m', why did MONTH change from 201801 to 2018-01-01?
@Victor, Pandas creates date in Ymd format, if date component is missing, it will add 01 as the date.
3

You can use groupby to split dataframes in to list of dataframes or a dictionary of datframes:

Dictionary of dataframes:

dict_of_dfs = {}
for n, g in df.groupby(df['MONTH']):
    dict_of_dfs[n] = g

List of dataframes:

list_of_dfs = []
for _, g in df.groupby(df['MONTH']):
    list_of_dfs.append(g)

Or as @BenMares suggests use comprehension:

dict_of_dfs = {

    month: group_df 

    for month, group_df in df.groupby('MONTH') 

}


list_of_dfs = [

    group_df 

    for _, group_df in df.groupby('MONTH')

]

2 Comments

It would be much more elegant to use a comprehension! {index: group_df for index, group_df in df.groupby('MONTH')}
Agreed dictionary comprehension. Nice, @BenMares.
2

You can also use local variable dictionary vars() in this way:

for m in df['MONTH'].unique():
    temp = 'df_{}'.format(m)    
    vars()[temp] = df[df['MONTH']==m]

each DataFrame is created as under name df_month. For example:

df_201801

    MONTH   NAME    INCOME
0   201801  A   100$
1   201801  B   20$

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.