6

I have this pandas data frame:

df = DataFrame({'id':['a','b','b','b','c','c'], 'category':['z','z','x','y','y','y'], 'category2':['1','2','2','2','1','2']})

which looks like:

  category category2 id
0        z         1  a
1        z         2  b
2        x         2  b
3        y         2  b
4        y         1  c
5        y         2  c

What i'd like to do is to groupby id and return the other two columns as a concatenation of unique strings.

The outcome would look like:

  category category2 id
0        z         1  a
1      zxy         2  b
2        y        12  c

1 Answer 1

21

Use groupby/agg to aggregate the groups. For each group, apply set to find the unique strings, and ''.join to concatenate the strings:

In [34]: df.groupby('id').agg(lambda x: ''.join(set(x)))
Out[34]: 
   category category2
id                   
a         z         1
b       yxz         2
c         y        12

To move id from the index to a column of the resultant DataFrame, call reset_index:

In [59]: df.groupby('id').agg(lambda x: ''.join(set(x))).reset_index()
Out[59]: 
  id category category2
0  a        z         1
1  b      yxz         2
2  c        y        12
Sign up to request clarification or add additional context in comments.

1 Comment

groupby with agg and lambda is quite slow on larger dataframe... is there a way to speed this up?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.