0

I have a dataframe like below:

ID Emp1  Emp2 Emp3

1  John NaN Alex

2  John Steve Alex

3  John Steve Alex

4  Clint Jorge NaN

I would like to convert the above dataframe into something like this:

John Emp1 [1,2,3]
Clint Emp1 [4] 
Steve Emp2 [2,3]
Jorge Emp2 [4]
Alex Emp3 [1,2]

   

So, basically for each column (Emp1, Emp2, Emp3), find "unique" values (drop NaN) and for each unique value, get "ID's" and "column name"

1 Answer 1

2

You'll need to melt your data to get into long-format. Then you'll need to perform a groupby aggregation to condense down your "name" and "Emp" data:

new_df = (df
 .melt(id_vars="ID", var_name="emp", value_name="name")
 .dropna()
 .groupby(["name", "emp"], as_index=False)
 .agg(list)
 .sort_values(["emp", "name"], ascending=[True, False])
)

print(new_df)
    name   emp         ID
1  Clint  Emp1        [4]
2   John  Emp1  [1, 2, 3]
3  Jorge  Emp2        [4]
4  Steve  Emp2     [2, 3]
0   Alex  Emp3  [1, 2, 3]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.