1

I have a list of dataframes:

all_df = [df1, df2, df3]

I would like to remove rows with duplicated indices in all dataframes in the list, such that the changes are reflected in the original dataframes df1, df2 and df3. I tried to do

for df in all_df:
    df = df[~df.index.duplicated()]

But the changes are only applied in the list, not on the original dataframes.

Essentially, I want to avoid doing the following:

df1 = df1[~df1.index.duplicated()]
df2 = df2[~df2.index.duplicated()]
df3 = df3[~df3.index.duplicated()]
all_df = [df1,df2,df3]
3
  • answered here stackoverflow.com/questions/41812564/… and here stackoverflow.com/questions/44630805/…. As to why: df looks at a new frame each time in the loop, forgetting the old one. Commented Feb 9, 2022 at 10:17
  • Thanks. These other questions are indeed related but do not answer my question completely. Commented Feb 9, 2022 at 10:33
  • As it stands they exactly answer your question. In case you didn't notice, the answer below repeats those things. Commented Feb 9, 2022 at 10:53

1 Answer 1

1

You need recreate list of DataFrames:

all_df = [df[~df.index.duplicated()] for df in all_df]

Or:

for i, df in enumerate(all_df):
    all_df[i] = df[~df.index.duplicated()]

print (all_df[0])

EDIT: If name of dictionary is important use dictionary of DataFrames, but also inplace modification df1, df2 is not here, need select by keys of dicts:

d = {'price': df1, 'volumes': df2}

d  = {k: df[~df.index.duplicated()] for k, df in all_df.items()}

print (d['price'])
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks. This will not remove rows in the original dataframes df1, df2, df3 though, right? It will only change what is in the list all_df. I would like to apply the changes also to the original dataframes df1, df2 and df3.
@Camille - but why? I see no reason.
@Camille - you need forget for df1, df2, df3 - working with list, use function for list.
Because in the rest of my code, I sometimes do operations on all dataframes, in which case I use the list all_df, and sometimes do operations only on some of them.
@Camille - I think no reason for use it. instead df1, df2 (it is forgotten) is necessary use all_df[0] , all_df[1] and use it instead df1, df2
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.