Pandas: How to combine rows based on multiple columns

Question

Say I have a dataframe like this

import pandas as pd 

test = [
    {1: 434, 2: 343, 3: [592]},
    {1: 434, 2: 343, 3: [192]},
    {1: 534, 2: 743, 3: [392]},
]

df = pd.DataFrame(test)
df


1   2   3
0   434 343 [592]
1   434 343 [192]
2   534 743 [392]

I want to combine rows where columns 2 and 3 are the same, and add up the lists in column 3.

Desired result

1   2   3
0   434 343 [592, 192]
2   534 743 [392]

Attempt so far

I believe group by could be used, and then some sort of aggregation function following it to combine the lists. So something like

df.groupby([1, 2]).aggregate(aggregation_functions)

Though I'm not sure what to put as the aggregation_functions

ouroboros1 · Accepted Answer · 2022-10-15 22:35:33Z

2

You are nearly there. Try:

res = df.groupby([1, 2], as_index=False)[3].sum()

print(res)

     1    2           3
0  434  343  [592, 192]
1  534  743       [392]

If you want to keep the first index for each group [e.g. 0, 2], you can use:

df[3] = df.groupby([1, 2])[3].transform(sum)
df.drop_duplicates(subset=[1, 2], inplace=True)

print(df)

     1    2           3
0  434  343  [592, 192]
2  534  743       [392]

edited Oct 15, 2022 at 22:35

answered Oct 15, 2022 at 22:25

ouroboros1

15.2k7 gold badges49 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Henry Ecker Over a year ago

res = df.groupby([1, 2], as_index=False)[3].sum() is one less copy.

Collectives™ on Stack Overflow

Pandas: How to combine rows based on multiple columns

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related