Adding duplicate rows together, with different conditions for different columns? [duplicate]

Question

My df looks something like this (very simplified):

Name	Age	A	B	C
John	27	12	17	13
David	23	14	50	10
John	27	4	19	7
David	23	10	8	12

Essentially the problem I have is that I want to merge the rows with duplicate names (i.e. same person). The age would stay the same, columns A and B need to be added together but for column C I must average the two values.

I have tried:

df.agg({'A' : ['sum'], 'B' : ['sum'], 'C': ['mean']}), but this just creates a new df with those column values.

I'm quite inexperienced with pandas so I have only tried a limited amount of things.

I would like the result to be like so:

Name	Age	A	B	C
John	27	16	36	10
David	23	24	58	11

In reality I have many more columns, (over 100). I have created lists of the column names which need to be added, averaged and then kept the same.

My main idea was to do something such as:

do_nothing = [] #lists contain column names already
add_cols = []
avg_cols = []

for i in df.columns:
 if i in do_nothing:
    #dont do anything
 if i in add_cols:
    #add cols
 if i in avg_cols:
    #get mean

If I only needed one operation e.g. 'sum' I know I could just do: print(df.groupby(["Name", "Age"], as_index=False).sum()), but I am unsure how to do this with multiple operations using the column lists described above.

Any suggestions would be very appreciated!

Arkadiusz · Accepted Answer · 2022-06-25 09:29:09Z

1

You should group your data by name and then add aggregation for different columns:

(df.groupby('Name', as_index=False, sort=False)
   .agg({'Age': 'first', 'A': sum, 'B': sum, 'C': 'mean'})
)

Output:

     Name  Age   A   B     C
0    John   27  16  36  10.0
1   David   23  24  58  11.0

answered Jun 25, 2022 at 9:29

Arkadiusz

1,8751 gold badge10 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Tasha Over a year ago

Thank you! Managed to work out the column list issue too based on your solution. In case it helps anybody else I did: res = (df.groupby('Name', as_index=False, sort=False) .agg({**{dn : 'first' for dn in do_nothing}, **{addcol : 'sum' for addcol in add_cols}, **{avcol: 'mean' for avcol in avg_cols}}) )

Collectives™ on Stack Overflow

Adding duplicate rows together, with different conditions for different columns? [duplicate]

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related