How to combine duplicate rows in python pandas

Question

I have a data frame similar to the one listed below. For some reason, each team is listed twice, one listing corresponding to each column.

import pandas as pd
import numpy as np
d = {'Team': ['1', '2', '3', '1', '2', '3'], 'Points for': [5, 10, 15, np.nan,np.nan,np.nan], 'Points against' : [np.nan,np.nan,np.nan, 3, 6, 9]}
df = pd.DataFrame(data=d)




Team    Points for  Points against
0   1        5            Nan
1   2       10            Nan
2   3       15            Nan
3   1       Nan            3
4   2       Nan            6
5   3       Nan            9

How can I just combine rows of duplicate team names so that there are no missing values? This is what I would like:

 Team   Points for  Points against
0   1        5             3
1   2       10             6
2   3       15             9

I have been trying to figure it out with pandas, but can't seem to get it. Thanks!

Does this answer your question? How to combine duplicate rows in pandas? — M-Wi
– M-Wi, Commented Apr 12, 2020 at 2:41
Just remove all the Nans from your input and remove the duplicate index values: d = {'Team': ['1', '2', '3'], 'Points for': [5, 10, 15], 'Points against' : [3, 6, 9]}. Or are you saying the data comes to you in this dirty format and you want help cleaning it? Ideally you'd fix whatever code produces this dirty data. — John Zwinck
– John Zwinck, Commented Apr 12, 2020 at 2:42
Unfortunately this is the way the data is for some odd reason. — bismo
– bismo, Commented Apr 12, 2020 at 2:44

sammywemmy · Accepted Answer · 2020-04-12 02:58:45Z

1

I made changes to your code, replacing string 'Nan' with numpy's nan.

One solution is to melt the data, drop the null entries, and pivot back to wide from long:

df = (df
      .melt('Team')
      .dropna()
      .pivot('Team','variable','value')
      .reset_index()
      .rename_axis(None,axis='columns')
      .astype(int)
     )

df


  Team  Points against  Points for
0   1      3              5
1   2      6              10
2   3      9              15

answered Apr 12, 2020 at 2:58

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Chris · Accepted Answer · 2020-04-12 02:48:54Z

0

One way using groupby. :

df = df.replace("Nan", np.nan)
new_df = df.groupby("Team").first()
print(new_df)

Output:

      Points for  Points against
Team                            
1            5.0             3.0
2           10.0             6.0
3           15.0             9.0

answered Apr 12, 2020 at 2:48

Chris

29.8k3 gold badges34 silver badges56 bronze badges

Comments

Eric Truett · Accepted Answer · 2020-04-12 02:57:00Z

0

You need to groupby the unique identifiers. If there is also a game ID or date or something like that, you might need to group on that as well.

df.groupby('Team').agg({'Points for': 'max', 'Points against': 'max'})

answered Apr 12, 2020 at 2:57

Eric Truett

3,0201 gold badge20 silver badges22 bronze badges

Comments

Community · Accepted Answer · 2020-06-20 09:12:55Z

0

pd.pivot_table(df, values = ['Points for','Points against'],index=['Team'], aggfunc=np.sum)[['Points for','Points against']]

Output

      Points for  Points against
Team                            
1            5.0             3.0
2           10.0             6.0
3           15.0             9.0

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Apr 12, 2020 at 3:21

dal233

801 silver badge6 bronze badges

Collectives™ on Stack Overflow

How to combine duplicate rows in python pandas

4 Answers 4

Comments

Comments

Comments

Output

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Output

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related