Merging multiple pandas dataframes without duplicating columns

Question

I have a large class of students separated into sections with each student havinga unique ID. I have the entire roster stored in a dataframe. I also have multiple dataframes representing the grades from a particular section of students on a particular assignment. I would like to merge all of this information into a single dataframe that represents my gradebook. For example:

import pandas as pd

# Initialize roster
data = [['ab10', 'Ann Big'], ['ca9', 'Carl Ahn'], ['jb19', 'John Brown'], ['cf25', 'Carol Fox']]
roster = pd.DataFrame(data, columns = ['ID', 'Name'])

# Initialize the section grades
data = [['ab10', 95], ['ca9', 72]]
grades0 = pd.DataFrame(data, columns = ['ID', 'Exp1'])

data = [['ab10', 83], ['ca9', 97]]
grades1 = pd.DataFrame(data, columns = ['ID', 'Exp2'])

data = [['jb19', 61], ['cf25', 95]]
grades2 = pd.DataFrame(data, columns = ['ID', 'Exp1'])

# Now merge the section grades with the roster to generate final gradebook
roster = roster.merge(grades0, on = 'ID', how = 'outer')
roster = roster.merge(grades1, on = 'ID', how = 'outer')
roster = roster.merge(grades2, on = 'ID', how = 'outer')

print(roster)

This code generates the following:

     ID        Name  Exp1_x  Exp2  Exp1_y
0  ab10     Ann Big    95.0  83.0     NaN
1   ca9    Carl Ahn    72.0  97.0     NaN
2  jb19  John Brown     NaN   NaN    61.0
3  cf25   Carol Fox     NaN   NaN    95.0

I don't want the duplicated Exp1 columns with the suffixes _x and _y. Instead I want:

     ID        Name    Exp1  Exp2
0  ab10     Ann Big    95.0  83.0 
1   ca9    Carl Ahn    72.0  97.0
2  jb19  John Brown    61.0   NaN
3  cf25   Carol Fox    95.0   NaN

There should be no duplicated data between the grade dataframes (but it would be good practice to raise an error were an overlap to exist).

Shubham Sharma · Accepted Answer · 2021-05-08 16:23:08Z

3

`reduce` with `combine_first`

As there are is no duplication between the grades dataframes, we can therefore reduce with combine_first to combine all the the dataframes together

from functools import reduce

reduce(pd.DataFrame.combine_first, 
      [g.set_index('ID') for g in (roster, grades0, grades1, grades2)])

      Exp1  Exp2        Name
ID                          
ab10  95.0  83.0     Ann Big
ca9   72.0  97.0    Carl Ahn
cf25  95.0   NaN   Carol Fox
jb19  61.0   NaN  John Brown

answered May 8, 2021 at 16:23

Shubham Sharma

71.8k6 gold badges26 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Melissa Over a year ago

combine_first is exactly what I needed! Many thanks.

Shubham Sharma Over a year ago

@Melissa Pleased to help!

Celius Stingher · Accepted Answer · 2021-05-08 15:24:07Z

0

I enjoy using pd.concat() with .groupby() for these cases, not only I believe might result cleaner, but you save a couple of lines of code and probably efficiency too (as you won't be making multiple merges). Replace your merge lines with:

roster = pd.concat([roster,grades0,grades1,grades2]).groupby(['ID'])['Exp1','Exp2'].sum().merge(roster,on='ID')
print(roster)

Which outputs:

    ID  Exp1  Exp2        Name
0  ab10  95.0  83.0     Ann Big
1   ca9  72.0  97.0    Carl Ahn
2  cf25  95.0   0.0   Carol Fox
3  jb19  61.0   0.0  John Brown

You can then re-order the columns to your preferred order. And if you prefer having NaNs to 0s then you can add .replace(0,np.nan) after the merge().

     ID  Exp1  Exp2        Name
0  ab10  95.0  83.0     Ann Big
1   ca9  72.0  97.0    Carl Ahn
2  cf25  95.0   NaN   Carol Fox
3  jb19  61.0   NaN  John Brown

answered May 8, 2021 at 15:24

Celius Stingher

18.4k6 gold badges26 silver badges54 bronze badges

1 Comment

Melissa Over a year ago

Thanks, but what if I do not know the names of the assignments a priori? I need to avoid hardcoding ['Exp1','Exp2'].

Collectives™ on Stack Overflow

Merging multiple pandas dataframes without duplicating columns

2 Answers 2

`reduce` with `combine_first`

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

reduce with combine_first

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related

`reduce` with `combine_first`