pandas create multiple dataframes based on duplicate index dataframe

Question

If I have a dataframe with duplicates in the index, how would I create a set of dataframes with no duplicates in the index?

More precisely, given the dataframe:

I would want as output, a list of dataframes:

This needs to be scalable to as many dataframes as needed based on the number of duplicates.

Pyd · Accepted Answer · 2019-05-23 09:16:43Z

3

df=df.reset_index()
dfs=[]
while not df.empty:
    dfs.append(df[~df.duplicated('index',keep='first')].set_index('index'))
    df=df[df.duplicated('index',keep='first')]

#dfs will have all your dataframes

answered May 23, 2019 at 9:16

Pyd

6,16919 gold badges59 silver badges117 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2019-05-23 09:15:21Z

2

Use GroupBy.cumcount for custom groups and then convert groups to dictionaries:

df = dict(tuple(df.groupby(df.groupby(level=0).cumcount())))
print (df)
{0:    a  b
1  1  6
2  3  8, 1:    a  b
1  2  7
2  4  9, 2:    a  b
2  5  0}

print (dfs[0])
   a  b
1  1  6
2  3  8

Or convert to list of DataFrames:

dfs = [x for i, x in df.groupby(df.groupby(level=0).cumcount())]
print (dfs)
[   a  b
1  1  6
2  3  8,    a  b
1  2  7
2  4  9,    a  b
2  5  0]

edited May 23, 2019 at 9:15

answered May 23, 2019 at 9:10

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

1 Comment

Jim Eisenberg Over a year ago

Thanks! This is a nice solution

Chris · Accepted Answer · 2019-05-23 09:10:06Z

1

Another approach is to use pd.DataFrame.groupby.nth:

import numpy as np

g = df.groupby(df.index)
cnt = np.bincount(df.index).max()
dfs = [g.nth(i) for i in range(cnt)]

Output:

answered May 23, 2019 at 9:10

Chris

29.8k3 gold badges34 silver badges56 bronze badges

1 Comment

Jim Eisenberg Over a year ago

Thanks! Accepted because it automatically orders the indexes and outputs as a list right away :P

Collectives™ on Stack Overflow

pandas create multiple dataframes based on duplicate index dataframe

3 Answers 3

Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related