Pandas merge column duplicate and sum value [closed]

Question

Closed. This question is off-topic. It is not currently accepting answers.

This question does not appear to be about data science, within the scope defined in the help center.

Closed 5 years ago.

How to merge duplicate column and sum their value?

What I have

A   30
A   40
B   50

What I need

A   70
B   50

DF for this example

d = {'address': ["A", "A", "B"], 'balances': [30, 40, 50]}
df = pd.DataFrame(data=d)
df

Denisa · Accepted Answer · 2020-02-27 12:53:44Z

7

In another case when you have a dataset with several duplicated columns and you wouldn't want to select them separately use:

df.groupby(by=df.columns, axis=1).sum()

answered Feb 27, 2020 at 12:53

Denisa

861 silver badge2 bronze badges

1

$\begingroup$ df.T.groupby(by=df.columns).sum().transpose() for pandas newer versions. df.groupby(axis=1) is deprecated since pandas v2.1.0. For axis=1, do frame.T.groupby(...) instead. $\endgroup$

JV conseil
– JV conseil

2024-09-15 01:28:40 +00:00
Commented Sep 15, 2024 at 1:28

Add a comment |

Esmailian · Accepted Answer · 2019-03-10 07:42:11Z

16

You may use

df2 = df.groupby(['address']).sum()

or

df2 = df.groupby(['address']).agg('sum')

If there are columns other than balances that you want to peak only the first or max value, or do mean instead of sum, you can go as follows:

d = {'address': ["A", "A", "B"], 'balances': [30, 40, 50], 'sessions': [2, 3, 4]} 
df = pd.DataFrame(data=d) 
df2 = df.groupby(['address']).agg({'balances': 'sum', 'sessions': 'mean'})

That outputs

         balances   sessions
address       
A              70       2.5  
B              50       4.0

You may add as_index = False to groupby arguments to have:

  address  balances  sessions
0       A        70       2.5
1       B        50       4.0

edited Mar 10, 2019 at 7:42

answered Mar 10, 2019 at 7:08

Esmailian

9,5932 gold badges34 silver badges50 bronze badges

$\begingroup$ what if for every value inadresses the sessions value is the same and you just want to keep it? {'address': ["A", "A", "B"], 'balances': [30, 40, 50], 'sessions': ["V","V","K"]} to {'address': ["A", "B"], 'balances': [70, 50], 'sessions': ["V","K"]} $\endgroup$

Tobias Kolb
– Tobias Kolb

2020-01-30 17:24:24 +00:00
Commented Jan 30, 2020 at 17:24
2

$\begingroup$ @TobiasKolb in that case you can use 'sessions': 'first' which keeps one of two (or more) "V"s $\endgroup$

Esmailian
– Esmailian

2020-07-17 17:24:12 +00:00
Commented Jul 17, 2020 at 17:24
$\begingroup$ This answer is more useful than the accepted one! $\endgroup$

spectre
– spectre

2021-12-30 06:32:34 +00:00
Commented Dec 30, 2021 at 6:32

Add a comment |

Stack Exchange Network

Pandas merge column duplicate and sum value [closed]

2 Answers 2

Linked

Hot Network Questions

Pandas merge column duplicate and sum value [closed]

2 Answers 2

Linked

Related

Hot Network Questions