I need to identify duplicate rows based on multiple columns in a Dataframe. The remaining column (PKID - which has Integer values) should merge as a list of integers. Example : Input data :(rows 0 & 1 are duplicates except for PKID column)
Col1 PKID SUBJECT ID
0 A 58305 ABC X1
1 A 57011 ABC X1
2 B 12345 XYZ X1
Expected result :
Col1 PKID SUBJECT ID
0 A [58305,57011] ABC X1
1 B 12345 XYZ X1
So if all columns except PKID have duplicates, merge all entries as 1 with PKID values being List of Integers.
How can this be achieved ?