Merging rows when some columns are the same using Pandas Python

Question

Now I have a dataframe, I want to merge rows. The value B is determined by the order in the strings in a list L = ['xx','yy','zz']

    A   B
0   a   xx
1   a   yy
2   b   zz
3   b   yy

For row 0 and 1, the result will be 'a' for column A and 'xx' for column B ('xx' come before 'yy' in L)
For row 2 and 3, the result will be 'b' for column A and 'yy' for column B ('yy' come before 'zz' in L)

Desired outcome:

    A   B
0   a   xx
1   b   yy

Pablo C · Accepted Answer · 2021-01-14 06:20:59Z

1

df['C'] = df['B'].map(dict(zip(L,range(len(L)))))
df.groupby('A')[['B','C']].apply(lambda x: x.iloc[x["C"].argmin()]['B'])
#A
#a    xx
#b    yy

You can get the same result using pandas.Categorical:

df['B'] = pd.Categorical(df['B'], categories = L, ordered = True)
df.groupby('A').min()
#      B
#A
#a    xx
#b    yy

answered Jan 14, 2021 at 5:00

Pablo C

4,7612 gold badges10 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

1 Answer 1