My df looks something like this (very simplified):
| Name | Age | A | B | C |
|---|---|---|---|---|
| John | 27 | 12 | 17 | 13 |
| David | 23 | 14 | 50 | 10 |
| John | 27 | 4 | 19 | 7 |
| David | 23 | 10 | 8 | 12 |
Essentially the problem I have is that I want to merge the rows with duplicate names (i.e. same person). The age would stay the same, columns A and B need to be added together but for column C I must average the two values.
I have tried:
df.agg({'A' : ['sum'], 'B' : ['sum'], 'C': ['mean']}), but this just creates a new df with those column values.
I'm quite inexperienced with pandas so I have only tried a limited amount of things.
I would like the result to be like so:
| Name | Age | A | B | C |
|---|---|---|---|---|
| John | 27 | 16 | 36 | 10 |
| David | 23 | 24 | 58 | 11 |
In reality I have many more columns, (over 100). I have created lists of the column names which need to be added, averaged and then kept the same.
My main idea was to do something such as:
do_nothing = [] #lists contain column names already
add_cols = []
avg_cols = []
for i in df.columns:
if i in do_nothing:
#dont do anything
if i in add_cols:
#add cols
if i in avg_cols:
#get mean
If I only needed one operation e.g. 'sum' I know I could just do:
print(df.groupby(["Name", "Age"], as_index=False).sum()), but I am unsure how to do this with multiple operations using the column lists described above.
Any suggestions would be very appreciated!