python - drop duplicated index in place in a pandas dataframe

Question

I have a list of dataframes:

all_df = [df1, df2, df3]

I would like to remove rows with duplicated indices in all dataframes in the list, such that the changes are reflected in the original dataframes df1, df2 and df3. I tried to do

for df in all_df:
    df = df[~df.index.duplicated()]

But the changes are only applied in the list, not on the original dataframes.

Essentially, I want to avoid doing the following:

df1 = df1[~df1.index.duplicated()]
df2 = df2[~df2.index.duplicated()]
df3 = df3[~df3.index.duplicated()]
all_df = [df1,df2,df3]

answered here stackoverflow.com/questions/41812564/… and here stackoverflow.com/questions/44630805/…. As to why: df looks at a new frame each time in the loop, forgetting the old one. — user18122470
– user18122470, Commented Feb 9, 2022 at 10:17
Thanks. These other questions are indeed related but do not answer my question completely. — Camille
– Camille, Commented Feb 9, 2022 at 10:33
As it stands they exactly answer your question. In case you didn't notice, the answer below repeats those things. — user18122470
– user18122470, Commented Feb 9, 2022 at 10:53

jezrael · Accepted Answer · 2022-02-09 10:29:33Z

1

You need recreate list of DataFrames:

all_df = [df[~df.index.duplicated()] for df in all_df]

Or:

for i, df in enumerate(all_df):
    all_df[i] = df[~df.index.duplicated()]

print (all_df[0])

EDIT: If name of dictionary is important use dictionary of DataFrames, but also inplace modification df1, df2 is not here, need select by keys of dicts:

d = {'price': df1, 'volumes': df2}

d  = {k: df[~df.index.duplicated()] for k, df in all_df.items()}

print (d['price'])

edited Feb 9, 2022 at 10:29

answered Feb 9, 2022 at 10:13

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Camille Over a year ago

Thanks. This will not remove rows in the original dataframes df1, df2, df3 though, right? It will only change what is in the list all_df. I would like to apply the changes also to the original dataframes df1, df2 and df3.

jezrael Over a year ago

@Camille - but why? I see no reason.

jezrael Over a year ago

@Camille - you need forget for df1, df2, df3 - working with list, use function for list.

Camille Over a year ago

Because in the rest of my code, I sometimes do operations on all dataframes, in which case I use the list all_df, and sometimes do operations only on some of them.

jezrael Over a year ago

@Camille - I think no reason for use it. instead df1, df2 (it is forgotten) is necessary use all_df[0] , all_df[1] and use it instead df1, df2

|

Collectives™ on Stack Overflow

python - drop duplicated index in place in a pandas dataframe

1 Answer 1

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related