1

I have two dataframes which I need to compare between two columns based on condition and print the output. For example:

df1:

| ID    | Date      | value  |
| 248   | 2021-10-30| 4.5    |
| 249   | 2021-09-21| 5.0    |
| 100   | 2021-02-01| 3,2    |

df2:

| ID    | Date      | value  |
| 245   | 2021-12-14| 4.5    |
| 246   | 2021-09-21| 5.0    |
| 247   | 2021-10-30| 3,2    |
| 248   | 2021-10-30| 3,1    |
| 249   | 2021-10-30| 2,2    |
| 250   | 2021-10-30| 6,3    |
| 251   | 2021-10-30| 9,1    |
| 252   | 2021-10-30| 2,0    |

I want to write a code which compares ID column and date column between two dataframes is having a conditions like below,

  • if "ID and date is matching from df1 to df2": print(df1['compare'] = 'Both matching')

  • if "ID is matching and date is not matching from df1 to df2" : print(df1['compare'] = 'Date not matching')

  • if "ID is Not matching from df1 to df2" : print(df1['compare'] = 'ID not available')

My result df1 should look like below:

df1 (expected result):

| ID    | Date      | value  | compare
| 248   | 2021-10-30| 4.5    | Both matching
| 249   | 2021-09-21| 5.0    | Id matching - Date not matching
| 100   | 2021-02-01| 3,2    | Id not available

how to do this with Python pandas dataframe?

4
  • 1
    Do both data frames have the same number of rows? Commented Mar 13, 2022 at 8:35
  • What does print(df1['compare'] = 'Both matching') mean? Commented Mar 13, 2022 at 8:36
  • @Ashutosh sharma, No, it doesn't. Df1 will have less rows and df2 will have around 1000 rows Commented Mar 13, 2022 at 8:36
  • @Amirhossein Kiani, I wanted to add another column in df1 and add the values saying whether the values are matching are not. Commented Mar 13, 2022 at 8:38

2 Answers 2

4

What I suggest you do is to use iterrows. It might not be the best idea, but still can solve your problem:

compareColumn = []
for index, row in df1.iterrows():
  df2Row = df2[df2["ID"] == row["ID"]]
  if df2Row.shape[0] == 0:
    compareColumn.append("ID not available")
  else:
    check = False
    for jndex, row2 in df2Row.iterrows():
      if row2["Date"] == row["Date"]:
        compareColumn.append("Both matching")
        check = True
        break
    if check == False:
      compareColumn.append("Date not matching")
df1["compare"] = compareColumn
df1

Output

ID Date value compare
0 248 2021-10-30 4.5 Both matching
1 249 2021-09-21 5 Date not matching
2 100 2021-02-01 3.2 ID not available
Sign up to request clarification or add additional context in comments.

4 Comments

One question - How can we get the values of the index of df2 for " both matching " rows ?
@BenHardy you can simply use: df[df1["compare"] == "Both matching"]
@ Amirhossein Kiani - I want to return the index from df2 . Lets say if the index of both matching row in df2 is 5, I want to return the value as 5
@BenHardy Thanks for your comment, Ben. If you are sure your dataframe has an index, you can use: df.index = df.index.set_names(['my_index']) and then df.reset_index(inplace=True), then you can simply use: df1[df1["compare"] == "Both matching"]["my_index"]
2

suppose 'ID' column is the index, then we can do like this:

def f(x):
    if x.name in df2.index:
        return 'Both matching' if x['Date']==df2.loc[x.name,'Date'] else 'Date not matching'
    return 'ID not available'

df1 = df1.assign(compare=df1.apply(f,1))

print(df1)

           Date value            compare
ID                                      
248  2021-10-30   4.5      Both matching
249  2021-09-21   5.0  Date not matching
100  2021-02-01   3,2   ID not available

1 Comment

thanks for the solution, ID column is not index since I will be having same ID repeated in some rows

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.