Comparing two excel files by using Python

Question

I have data in two excel files like below

Sample DS Created:

df1 =  {'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [1, 1, 2]}
df1 = pd.DataFrame(df1, columns=df1.keys())

df2 =  {'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [2, 1, 2]}
df2 = pd.DataFrame(df2, columns=df2.keys())

please help me to get difference of both excels as below..

Transaction_name    Count_df1        Count_df2
SC-001_Homepage          1              2
SC-001_Homepage          1              1
SC-001_Homepage          2              2

First line of the output count is not matching. Will i be able to highlight in different color? Sample code is as below

#COmparing both excels
df1 = pd.read_csv(r"WLMOUTPUT.csv", dtype=object)
df2 = pd.read_csv(r"results.csv", dtype=object)

print('\n', df1)
print('\n',df2)

df1['Compare'] = df1['Transaction_Name'] + df1['Count'].astype(str)
df2['Compare'] = df2['Transaction_Name'] + df2['Count'].astype(str)

print('\n', df1.loc[~df1['Compare'].isin(df2['Compare'])])

Thanks in advance

Please find the work i had done so far to achieve the result #Formatting WLM data data = pd.read_excel(r"Script wise coordinates comparison_edited123.xlsx", sheet_name='WLM', dtype=object) data = pd.DataFrame(data, columns=data.keys()) df = pd.melt(data, id_vars=['Script_name'], value_name='Count') df['Transaction_Name'] = df['Script_name'] + '_' + df['variable'] Final_df = df[['Transaction_Name', 'Count']] Final_df.to_csv(r'WLMOUTPUT.csv', index=False) Code continues in the next comment to compare both CSV files — SG131712
– SG131712, Commented Feb 4, 2019 at 5:49
You have to provide it inside your question, you can edit it to adapt accordingly, also you need to format your text — Shailyn Ortiz
– Shailyn Ortiz, Commented Feb 4, 2019 at 6:16
@SwethaGorantla the reason no one has answered yet is there is too much info here. I would suggest you to just post 5-6 lines of the sample data which depicts what you are trying to acheive in a dataframe format so we can copy the data and replicate the issue along with 2 lines of explaination , your sample code (not the full, the useful bit only) and an expected output, just those. :) Check this — anky
– anky, Commented Feb 4, 2019 at 7:31

Luc Blassel · Accepted Answer · 2019-02-04 09:42:53Z

1

You can use the merge function.

import pandas as pd

df1 = pd.DataFrame({'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [1, 1, 2]}) 
df2 = pd.DataFrame({'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [2, 1, 2]})

merged_df = pd.merge(df1, df2, on = 'Transaction_Name', suffixes=('_df1', '_df2'))

This will give you this DataFrame:

print(merged_df)

   Count_df1   Transaction_Name  Count_df2
0          1    SC-001_Homepage          2
1          1    SC-002_Homepage          1
2          2  SC-001_Signinlink          2

And then you can just use subsetting to see which rows have different counts:

diff = merged_df[merged_df['Count_df1'] != merged_df['Count_df2']]

And you will get this:

print(diff)

   Count_df1 Transaction_Name  Count_df2
0          1  SC-001_Homepage          2

answered Feb 4, 2019 at 9:42

Luc Blassel

4045 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

SG131712 Over a year ago

Thanks Luc Blassel, i could implement it and got required output. Can you also please help me if any of the transactions are missing how to capture it. For example in df1 there are 3 transactions and count of it, in df2 it has only 2 transactions and count of it. How to find that one transaction which is missing in df2?

Luc Blassel Over a year ago

You can specify the outer option when merging the 2 DataFrames: merged_df = pd.merge(df1, df2, on = 'Transaction_Name', suffixes=('_df1', '_df2'), how='outer') an then when one of the transactions is missing in one of the DFs it will show us as NaN in merged_df. If this answered your question, please consider accepting it as the answer.

SG131712 Over a year ago

As you said it gives NaN at the missing transaction also it gives complete output. Is it possible to get only the missing Transaction name in df2 as below... Transaction_Name Count_df1 Count_df2 0 SC-001_AppLaunch_Signed 0 NaN

Luc Blassel Over a year ago

From this answer you can get rows with only NaN using merged_df[merged_df.isnull().any(axis=1)]

Collectives™ on Stack Overflow

Comparing two excel files by using Python

Sample DS Created:

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Sample DS Created:

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related