2

I have got a child table here. Here is the sample data.

+----+------+----------+----------------+--------+---------+
| ID | Name |   City   |     Email      | Phone  | Country |
+----+------+----------+----------------+--------+---------+
|  1 | Ted  | Chicago  | [email protected]  | 132321 | USA     |
|  1 | Josh | Richmond | [email protected]  | 435324 | USA     |
|  2 | John | Seattle  | [email protected]  | 322421 | USA     |
|  2 | John | Berkley  | [email protected] | 322421 | USA     |
|  2 | Mike | Seattle  | [email protected] | 322421 | USA     |
+----+------+----------+----------------+--------+---------+

The rows above need to be appended together. Only unique values are required.

+----+---------------+----------------------+----------------------------------+-------------------+---------+
| ID |     Name      |         City         |              Email               |       Phone       | Country |
+----+---------------+----------------------+----------------------------------+-------------------+---------+
|  1 | 'Ted','Josh'  | 'Chicago','Richmond' | '[email protected]'                  | '132321','435324' | 'USA'   |
|  2 | 'John','Mike' | 'Seattle','Berkley'  | '[email protected]','[email protected]' | '322421'          | 'USA'   |
+----+---------------+----------------------+----------------------------------+-------------------+---------+

1 Answer 1

4

Use if ordering is important GroupBy.agg with lambda function and remove duplicates by dictionary:

df1=df.groupby('ID').agg(lambda x: ','.join(dict.fromkeys(x.astype(str)).keys())).reset_index()

#another alternative, but slow if large data
#df = df.groupby('ID').agg(lambda x: ','.join(x.astype(str).unique())).reset_index()
print (df1)
   ID       Name              City                         Email  \
0   1   Ted,Josh  Chicago,Richmond                 [email protected]   
1   2  John,Mike   Seattle,Berkley  [email protected],[email protected]   

           Phone Country  
0  132321,435324     USA  
1         322421     USA  

If ordering is not important use similar solution with removed duplicates by sets:

df2 = df.groupby('ID').agg(lambda x: ','.join(set(x.astype(str)))).reset_index()
print (df2)
   ID       Name              City                         Email  \
0   1   Josh,Ted  Richmond,Chicago                 [email protected]   
1   2  John,Mike   Berkley,Seattle  [email protected],[email protected]   

           Phone Country  
0  435324,132321     USA  
1         322421     USA  
Sign up to request clarification or add additional context in comments.

1 Comment

I come across random response of Jezrael, they’re alway clear and well coded, all answers from him hits me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.