1

Emp_rating_df

  Emp_Id       A1   A2   A3   A4
0 1001         4    3    6    7
1 1002         7    2    4    5
2 1003         3    8    2    6
3 1004         7    5    4    7

Comp_df

  Emp_Id       A1   A2   A3   A4
0 1001         4    3    6    7

I need to compare two df which contains employee ratings.

Emp_rating_df contains employee ratings out of 10 and Comp_df tells which employee to compare with all the employees from Emp_rating_df.

If emp A has rating more than in any particular advantage column (A1, A2, A3, A4) then emp B then 2 , if same then 1 else 0.

Output_df-

 Emp_Id       A1   A2   A3   A4
0 1001         1    1    1    1 
1 1002         0    2    2    2
2 1003         2    0    2    2
3 1004         0    0    2    1

First row would be 1 because of self comparison.

4
  • for Emp 1002 , column A1 is 7 which is greater than 4 , why isnt 2 assigned in your expected output for the same? Commented May 2, 2020 at 16:47
  • Hi, I have updated the question please look in to it. Commented May 2, 2020 at 16:59
  • Again for same Emp 1002 , column A4 is 5 as compared to 7 , so 7 is greater hence should A4 column not be 2 as well? can you recheck all values and update? Commented May 2, 2020 at 17:07
  • 1
    Sorry my bad... Updated again Commented May 2, 2020 at 17:52

1 Answer 1

1

You can try the below approach:

First merge and filter:

m = Emp_rating_df.merge(Comp_df,'left','Emp_Id').ffill().bfill()
a = m.filter(like='_x')
b = m.filter(like='_y')

Then assign by condition:

cond1 = b.to_numpy() > a.to_numpy()
cond2 = b.to_numpy() == a.to_numpy()
Output = Emp_rating_df.copy()
Output[a.columns.str.split('_').str[0]] = np.select([cond1,cond2],[2,1],0)

print(Output)

   Emp_Id  A1  A2  A3  A4
0    1001   1   1   1   1
1    1002   0   2   2   2
2    1003   2   0   2   2
3    1004   0   0   2   1
Sign up to request clarification or add additional context in comments.

5 Comments

why did you use bfill() even ffill() will give you the same result?
How do we deal if we have more than 3 condition. Like I want to see b.to_numpy() - a.to_numpy() > 5 then 2, like two three other conditions.
@harsh then just define cond3 as the condition you mentioned and pass it in the list along with cond 1 and cond2 and in the value list where currently 2,1 is given add another 2 i.e 2,1,2
i created 5 different condition on column level cond1 = (b.to_numpy()[:, :1] - a.to_numpy()[:, :1] > 5), cond2 = b.to_numpy()[:, 1:2] == a.to_numpy()[:, 1:2], cond3 = b.to_numpy()[:, 2:3] < a.to_numpy()[:, 2:3], cond4 = b.to_numpy()[:, 3:4] < a.to_numpy()[:, 3:4], cond5 = b.to_numpy()[:, 4:5] < a.to_numpy()[:, 4:5] then i tried this but it is giving me error Output[a.columns.str.split('_').str[0]] = np.select([cond1,cond2,cond3,cond4,cond5],[25,1,-1,22,77]) "ValueError: Must have equal len keys and value when setting with an ndarray" @anky
Hard to say without sample data. You can create a new question since this question has been answered and someone online can help you with that. @harsh

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.