0

Just running a simple for-loop on a list of dataframes, however trying to add an IF clause... and it keeps erroring out.

df_list = [df1, df2, df3]
for df in df_list:
   if df in [df1, df2]:
      x = 1
   else:
      x = 2
.
.
.
ValueError: Can only compare identically-labeled DataFrame objects

Above is a simplified version of what I'm attempting. Can anyone tell me why this isn't working and a fix?

6
  • What is the error you got? Commented Mar 31, 2022 at 15:40
  • just updated it! Commented Mar 31, 2022 at 15:40
  • the error message is self-explanatory. Maybe try it with equality? Commented Mar 31, 2022 at 15:42
  • I smell an XY problem here. if statements in for loops are one thing, but your problem seems to come in when trying to evaluate if one dataframe is one of the dataframes in a list. Do you want to know if it is the same object? Or just has the same index, column, and values? Your question is ambiguous. Commented Mar 31, 2022 at 16:17
  • I wanted to check whether they were the same object Commented Mar 31, 2022 at 17:07

5 Answers 5

3

You could use DataFrame.equals with any instead:

df_list = [df1, df2, df3]
for df in df_list:
    if any(df.equals(y) for y in [df1, df2]):
        x = 1
    else:
        x = 2
Sign up to request clarification or add additional context in comments.

Comments

2

Do NOT use .equals() here!

It's unnecessary and slowing down you program, use id() instead:

df_list = [df1, df2, df3]
for df in df_list:
   if id(df) in [id(df1), id(df2)]: 
      x = 1
   else:
      x = 2

Because here you just need to compare the identities, rather than the values.

2 Comments

Depending upon the use case id can be nice, but it could also be unwanted. For instance if df3 was created with df3=df1, then they share the same id, yet for some reason perhaps they should be handled differently. Guess that could be avoided with df3=df1.copy() so it's truly a different object, not just a reference
If df3=df1 is in the case, then @ALollz 's answer will be your choice. Neither id() nor .equals() can distinguish them. But id() is able to tell df3 from df1 if df3 is a copy of df1, while .equals() is not.
2

You could use a better container and reference them by labels.

Equality checks for large DataFrames with object types can become slow, >> seconds, but it will take ~ns to check if the label is in a list.

dfs = {'df1': df1, 'df2': df2, 'df3': df3}
for label, df in dfs.items():
    if label in ['df1', 'df2']:
        x = 1
    else:
        x = 2

Comments

0

You need to use df.equals()

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.equals.html

df_list = [df1, df2, df3]
for df in df_list:
   if df.equals(df1) or df.equals(df2):
      # blah blah

Comments

0

The following link might help: Pandas "Can only compare identically-labeled DataFrame objects" error

According to this, the data frames being compared with == should have the same columns and index otherwise it gives the error.

Alternatively, you can compare the data frames using dataframe.equals method. Please refer to the documentation below: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.equals.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.