Compare two dataframes and create comparison matrix in Python?

Question

Emp_rating_df

  Emp_Id       A1   A2   A3   A4
0 1001         4    3    6    7
1 1002         7    2    4    5
2 1003         3    8    2    6
3 1004         7    5    4    7

Comp_df

  Emp_Id       A1   A2   A3   A4
0 1001         4    3    6    7

I need to compare two df which contains employee ratings.

Emp_rating_df contains employee ratings out of 10 and Comp_df tells which employee to compare with all the employees from Emp_rating_df.

If emp A has rating more than in any particular advantage column (A1, A2, A3, A4) then emp B then 2 , if same then 1 else 0.

Output_df-

 Emp_Id       A1   A2   A3   A4
0 1001         1    1    1    1 
1 1002         0    2    2    2
2 1003         2    0    2    2
3 1004         0    0    2    1

First row would be 1 because of self comparison.

for Emp 1002 , column A1 is 7 which is greater than 4 , why isnt 2 assigned in your expected output for the same? — anky
– anky, Commented May 2, 2020 at 16:47
Again for same Emp 1002 , column A4 is 5 as compared to 7 , so 7 is greater hence should A4 column not be 2 as well? can you recheck all values and update? — anky
– anky, Commented May 2, 2020 at 17:07

anky · Accepted Answer · 2020-05-02 17:55:37Z

1

You can try the below approach:

First merge and filter:

m = Emp_rating_df.merge(Comp_df,'left','Emp_Id').ffill().bfill()
a = m.filter(like='_x')
b = m.filter(like='_y')

Then assign by condition:

cond1 = b.to_numpy() > a.to_numpy()
cond2 = b.to_numpy() == a.to_numpy()
Output = Emp_rating_df.copy()
Output[a.columns.str.split('_').str[0]] = np.select([cond1,cond2],[2,1],0)

print(Output)

   Emp_Id  A1  A2  A3  A4
0    1001   1   1   1   1
1    1002   0   2   2   2
2    1003   2   0   2   2
3    1004   0   0   2   1

answered May 2, 2020 at 17:55

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

harsh Over a year ago

why did you use bfill() even ffill() will give you the same result?

harsh Over a year ago

How do we deal if we have more than 3 condition. Like I want to see b.to_numpy() - a.to_numpy() > 5 then 2, like two three other conditions.

anky Over a year ago

@harsh then just define cond3 as the condition you mentioned and pass it in the list along with cond 1 and cond2 and in the value list where currently 2,1 is given add another 2 i.e 2,1,2

harsh Over a year ago

i created 5 different condition on column level cond1 = (b.to_numpy()[:, :1] - a.to_numpy()[:, :1] > 5), cond2 = b.to_numpy()[:, 1:2] == a.to_numpy()[:, 1:2], cond3 = b.to_numpy()[:, 2:3] < a.to_numpy()[:, 2:3], cond4 = b.to_numpy()[:, 3:4] < a.to_numpy()[:, 3:4], cond5 = b.to_numpy()[:, 4:5] < a.to_numpy()[:, 4:5] then i tried this but it is giving me error Output[a.columns.str.split('_').str[0]] = np.select([cond1,cond2,cond3,cond4,cond5],[25,1,-1,22,77]) "ValueError: Must have equal len keys and value when setting with an ndarray" @anky

anky Over a year ago

Hard to say without sample data. You can create a new question since this question has been answered and someone online can help you with that. @harsh

Collectives™ on Stack Overflow

Compare two dataframes and create comparison matrix in Python?

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related