0

Suppose I have this DataFrame:

df = pd.DataFrame({'col1': ['AC1', 'AC2', 'AC3', 'AC4', 'AC5'], 
                   'col2': ['A', 'B', 'B', 'A', 'C'], 
                   'col3': ['ABC', 'DEF', 'FGH', 'IJK', 'LMN']})

I want to comnbine text of 'col3' if values in 'col2' are duplicated. The result should be like this:

    col1  col2       col3
0   AC1    A      ABC, IJK
1   AC2    B      DEF, FGH
2   AC3    B      DEF, FGH
3   AC4    A      ABC, IJK
4   AC5    C      LMN

I start this excercise by finding duplicated values in this dataframe:

col2 = df['col2']
df1 = df[col2.isin(col2[col2.duplicated()])]

Any suggestion what I should do next?

2 Answers 2

3

You can use

a = df.groupby('col2').apply(lambda group: ','.join(group['col3']))
df['col3'] = df['col2'].map(a)

Output

print(df)
   col1     col2    col3
0   AC1     A   ABC,IJK
1   AC2     B   DEF,FGH
2   AC3     B   DEF,FGH
3   AC4     A   ABC,IJK
4   AC5     C   LMN
Sign up to request clarification or add additional context in comments.

Comments

3

You might want to leverage the groupby and apply functions in Pandas

df.groupby('col2').apply(lambda group: ','.join(group['col3']))

5 Comments

don't forget to close your parenthesis at the end. You may want to add a comma in group: ', '.join... to answer the OP in full according to col3
great - check your answer it's a bit off from the OPs request (see number of rows in your result vs his/her result)
k it's comma join
I think map was missing. I have added it in my answer. Up-voted your answer as well.
@moys Voted yours, mine requires more steps to clean up the index : p

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.