1

There are quite a few similar questions out there, but I am not sure if there is one that tackles both index and row values. (relevant to binary classification df)

So what I am trying to do is compare the columns with the same name to have the same values and index. If not, simply return an error.

Let's say DataFrame df has columns a, b and c and df_orginal has columns from a to z.

How can we first find the columns that have the same name between those 2 DataFrames, and then check the contents of those columns such that they match row by row in value and index between a, b and c from df and df_orginal

The contents of all the columns are numerical, that's why I want to compare the combination of index and values

Demo:

In [1]: df
Out[1]:
   a  b  c  
0  0  1  2  
1  1  2  0  
2  0  1  0  
3  1  1  0  
4  3  1  0  

In [3]: df_orginal
Out[3]:
   a  b c d e f g ......
0  4  3 1 1 0 0 0
1  3  1 2 1 1 2 1
2  1  2 1 1 1 2 1
3  3  4 1 1 1 2 1
4  0  3 0 0 1 1 1

In the above example, for those columns that have the same column name, compare the combination of index and value and flag an error if the combination of index and value is not correct

4
  • Please provide some data for better understanding. Commented Feb 15, 2018 at 10:33
  • Please create a Minimal, Complete, and Verifiable example - stackoverflow.com/help/mcve Commented Feb 15, 2018 at 10:38
  • @ShivamGaur please check Commented Feb 15, 2018 at 10:42
  • 1
    Example output? Just df with 'Error' where there is no match? Are your indexes always consecutive, starting at 0? Commented Feb 15, 2018 at 10:58

2 Answers 2

0
common_cols = df.columns.intersection(df_original.columns)

for col in common_cols:

    df1_ind_val_pair = df[col].index.astype(str) + ' ' + df[col].astype(str)
    df2_ind_val_pair = df_original[col].index.astype(str) + ' ' + df_original[col].astype(str)

    if any(df1_ind_val_pair != df2_ind_val_pair):
        print('Found one or more unequal (index, value) pairs in col {}'.format(col))
Sign up to request clarification or add additional context in comments.

5 Comments

this looks great but I think instead of comparing the indexes by their own and the values by their own, they should compare both indexes and values at the same time
guar as in it compares the combination of both
So, for each column, you would like compare the (index, value) pairs from df to (index, value) pairs from df_original, and print the column name if there is even one pair that is not equal right?
incredible! Thank you
Indexes behave like ordered sets so they have most of the methods defined on sets as well. You can just use df.columns.intersection
0

IIUC:

Use pd.DataFrame.align with a join method of inner. Then pass the resulting tuple unpacked to pd.DataFrame.eq

pd.DataFrame.eq(*df.align(dfo, 'inner'))

       a      b      c
0  False  False  False
1  False  False  False
2  False  False  False
3  False  False  False
4  False  False   True

To see rows that have all columns True, filter with this mask:

pd.DataFrame.eq(*df.align(dfo, 'inner')).all(1)

0    False
1    False
2    False
3    False
4    False
dtype: bool

with the sample data however, the result will be empty

df[pd.DataFrame.eq(*df.align(dfo, 'inner')).all(1)]

Empty DataFrame
Columns: [a, b, c]
Index: []

Same answer but with clearer code

def eq(d1, d2):
    d1, d2 = d1.align(d2, 'inner')
    return d1 == d2

eq(df, dfo)

       a      b      c
0  False  False  False
1  False  False  False
2  False  False  False
3  False  False  False
4  False  False   True

3 Comments

yes, this also works, but for a huge table 5mil x 5mil it is almost impossible to check everything. Can it print the position of those that are not equal ?
Of course, I can't imagine you actually want to print every failure. But still, we could. What do you actually want to do? Are you logging these? Do you want the ordinal positions (index and columns) or the labels? You can clear this up by showing the output you expect.
it would just print if there is false or not. Not actually every single one. But rather if any exists

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.