Pandas merge multiple rows based on two columns

Question

I have a dataframe shown below

     pid          date      label   age
0   0001    2001-09-24      False   34
1   0001    2006-03-02      True    39
2   0001    2006-03-02      True    39
3   0002    2003-02-07      True    23
4   0002    2004-08-02      True    24
5   0002    2004-08-02      False   24
6   0003    2001-05-25      False   25
7   0003    2001-05-25      False   25

Which I want to aggregate based on both pid and date, where if multiple rows have the same pid and data then age stays the same and label = True if any or all of these rows have True label:

     pid          date      label   age
0   0001    2001-09-24      False   34
1   0001    2006-03-02      True    39
3   0002    2003-02-07      True    23
4   0002    2004-08-02      True    24
5   0003    2001-05-25      False   25

I have tried groupby but all I could do is with one column - pid and no success with two columns.

BENY · Accepted Answer · 2022-05-24 00:40:10Z

1

In your case sort_values + drop_duplicates

out = df.sort_values('label').drop_duplicates(['pid','date','age'],keep='last').sort_index()
Out[240]: 
   pid        date  label  age
0    1  2001-09-24  False   34
2    1  2006-03-02   True   39
3    2  2003-02-07   True   23
4    2  2004-08-02   True   24
7    3  2001-05-25  False   25

answered May 24, 2022 at 0:40

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas merge multiple rows based on two columns

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related