Combine multiple rows into single row in Pandas Dataframe

Question

I have got a child table here. Here is the sample data.

+----+------+----------+----------------+--------+---------+
| ID | Name |   City   |     Email      | Phone  | Country |
+----+------+----------+----------------+--------+---------+
|  1 | Ted  | Chicago  | [email protected]  | 132321 | USA     |
|  1 | Josh | Richmond | [email protected]  | 435324 | USA     |
|  2 | John | Seattle  | [email protected]  | 322421 | USA     |
|  2 | John | Berkley  | [email protected] | 322421 | USA     |
|  2 | Mike | Seattle  | [email protected] | 322421 | USA     |
+----+------+----------+----------------+--------+---------+

The rows above need to be appended together. Only unique values are required.

+----+---------------+----------------------+----------------------------------+-------------------+---------+
| ID |     Name      |         City         |              Email               |       Phone       | Country |
+----+---------------+----------------------+----------------------------------+-------------------+---------+
|  1 | 'Ted','Josh'  | 'Chicago','Richmond' | '[email protected]'                  | '132321','435324' | 'USA'   |
|  2 | 'John','Mike' | 'Seattle','Berkley'  | '[email protected]','[email protected]' | '322421'          | 'USA'   |
+----+---------------+----------------------+----------------------------------+-------------------+---------+

jezrael · Accepted Answer · 2020-02-26 07:19:48Z

4

Use if ordering is important GroupBy.agg with lambda function and remove duplicates by dictionary:

df1=df.groupby('ID').agg(lambda x: ','.join(dict.fromkeys(x.astype(str)).keys())).reset_index()

#another alternative, but slow if large data
#df = df.groupby('ID').agg(lambda x: ','.join(x.astype(str).unique())).reset_index()
print (df1)
   ID       Name              City                         Email  \
0   1   Ted,Josh  Chicago,Richmond                 [email protected]   
1   2  John,Mike   Seattle,Berkley  [email protected],[email protected]   

           Phone Country  
0  132321,435324     USA  
1         322421     USA

If ordering is not important use similar solution with removed duplicates by sets:

df2 = df.groupby('ID').agg(lambda x: ','.join(set(x.astype(str)))).reset_index()
print (df2)
   ID       Name              City                         Email  \
0   1   Josh,Ted  Richmond,Chicago                 [email protected]   
1   2  John,Mike   Berkley,Seattle  [email protected],[email protected]   

           Phone Country  
0  435324,132321     USA  
1         322421     USA

edited Feb 26, 2020 at 7:19

answered Feb 26, 2020 at 7:12

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

abdoulsn Over a year ago

I come across random response of Jezrael, they’re alway clear and well coded, all answers from him hits me.

Collectives™ on Stack Overflow

Combine multiple rows into single row in Pandas Dataframe

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related