0

I have data in two excel files like below

Sample DS Created:

df1 =  {'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [1, 1, 2]}
df1 = pd.DataFrame(df1, columns=df1.keys())

df2 =  {'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [2, 1, 2]}
df2 = pd.DataFrame(df2, columns=df2.keys())

please help me to get difference of both excels as below..

Transaction_name    Count_df1        Count_df2
SC-001_Homepage          1              2
SC-001_Homepage          1              1
SC-001_Homepage          2              2

First line of the output count is not matching. Will i be able to highlight in different color? Sample code is as below

#COmparing both excels
df1 = pd.read_csv(r"WLMOUTPUT.csv", dtype=object)
df2 = pd.read_csv(r"results.csv", dtype=object)

print('\n', df1)
print('\n',df2)

df1['Compare'] = df1['Transaction_Name'] + df1['Count'].astype(str)
df2['Compare'] = df2['Transaction_Name'] + df2['Count'].astype(str)

print('\n', df1.loc[~df1['Compare'].isin(df2['Compare'])])

Thanks in advance

7
  • you have to show examples of the work you've done so far. Commented Feb 4, 2019 at 5:32
  • Please find the work i had done so far to achieve the result #Formatting WLM data data = pd.read_excel(r"Script wise coordinates comparison_edited123.xlsx", sheet_name='WLM', dtype=object) data = pd.DataFrame(data, columns=data.keys()) df = pd.melt(data, id_vars=['Script_name'], value_name='Count') df['Transaction_Name'] = df['Script_name'] + '_' + df['variable'] Final_df = df[['Transaction_Name', 'Count']] Final_df.to_csv(r'WLMOUTPUT.csv', index=False) Code continues in the next comment to compare both CSV files Commented Feb 4, 2019 at 5:49
  • 1
    You have to provide it inside your question, you can edit it to adapt accordingly, also you need to format your text Commented Feb 4, 2019 at 6:16
  • 1
    @SwethaGorantla the reason no one has answered yet is there is too much info here. I would suggest you to just post 5-6 lines of the sample data which depicts what you are trying to acheive in a dataframe format so we can copy the data and replicate the issue along with 2 lines of explaination , your sample code (not the full, the useful bit only) and an expected output, just those. :) Check this Commented Feb 4, 2019 at 7:31
  • 1
    Edited the question as explained, please check once. Commented Feb 4, 2019 at 8:59

1 Answer 1

1

You can use the merge function.

import pandas as pd

df1 = pd.DataFrame({'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [1, 1, 2]}) 
df2 = pd.DataFrame({'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [2, 1, 2]})

merged_df = pd.merge(df1, df2, on = 'Transaction_Name', suffixes=('_df1', '_df2'))

This will give you this DataFrame:

print(merged_df)

   Count_df1   Transaction_Name  Count_df2
0          1    SC-001_Homepage          2
1          1    SC-002_Homepage          1
2          2  SC-001_Signinlink          2

And then you can just use subsetting to see which rows have different counts:

diff = merged_df[merged_df['Count_df1'] != merged_df['Count_df2']]

And you will get this:

print(diff)

   Count_df1 Transaction_Name  Count_df2
0          1  SC-001_Homepage          2
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks Luc Blassel, i could implement it and got required output. Can you also please help me if any of the transactions are missing how to capture it. For example in df1 there are 3 transactions and count of it, in df2 it has only 2 transactions and count of it. How to find that one transaction which is missing in df2?
You can specify the outer option when merging the 2 DataFrames: merged_df = pd.merge(df1, df2, on = 'Transaction_Name', suffixes=('_df1', '_df2'), how='outer') an then when one of the transactions is missing in one of the DFs it will show us as NaN in merged_df. If this answered your question, please consider accepting it as the answer.
As you said it gives NaN at the missing transaction also it gives complete output. Is it possible to get only the missing Transaction name in df2 as below... Transaction_Name Count_df1 Count_df2 0 SC-001_AppLaunch_Signed 0 NaN
From this answer you can get rows with only NaN using merged_df[merged_df.isnull().any(axis=1)]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.