5

If I have a dataframe with duplicates in the index, how would I create a set of dataframes with no duplicates in the index?

More precisely, given the dataframe:

   a  b
1  1  6
1  2  7
2  3  8
2  4  9
2  5  0

I would want as output, a list of dataframes:

   a  b
1  1  6
2  3  8


   a  b
1  2  7
2  4  9


   a  b
2  5  0

This needs to be scalable to as many dataframes as needed based on the number of duplicates.

3 Answers 3

3
df=df.reset_index()
dfs=[]
while not df.empty:
    dfs.append(df[~df.duplicated('index',keep='first')].set_index('index'))
    df=df[df.duplicated('index',keep='first')]

#dfs will have all your dataframes
Sign up to request clarification or add additional context in comments.

Comments

2

Use GroupBy.cumcount for custom groups and then convert groups to dictionaries:

df = dict(tuple(df.groupby(df.groupby(level=0).cumcount())))
print (df)
{0:    a  b
1  1  6
2  3  8, 1:    a  b
1  2  7
2  4  9, 2:    a  b
2  5  0}

print (dfs[0])
   a  b
1  1  6
2  3  8

Or convert to list of DataFrames:

dfs = [x for i, x in df.groupby(df.groupby(level=0).cumcount())]
print (dfs)
[   a  b
1  1  6
2  3  8,    a  b
1  2  7
2  4  9,    a  b
2  5  0]

1 Comment

Thanks! This is a nice solution
1

Another approach is to use pd.DataFrame.groupby.nth:

import numpy as np

g = df.groupby(df.index)
cnt = np.bincount(df.index).max()
dfs = [g.nth(i) for i in range(cnt)]

Output:

[  a  b
1  1  6
2  3  8,    
   a  b
1  2  7
2  4  9,
   a  b
2  5  0]

1 Comment

Thanks! Accepted because it automatically orders the indexes and outputs as a list right away :P

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.