I have a large class of students separated into sections with each student havinga unique ID. I have the entire roster stored in a dataframe. I also have multiple dataframes representing the grades from a particular section of students on a particular assignment. I would like to merge all of this information into a single dataframe that represents my gradebook. For example:
import pandas as pd
# Initialize roster
data = [['ab10', 'Ann Big'], ['ca9', 'Carl Ahn'], ['jb19', 'John Brown'], ['cf25', 'Carol Fox']]
roster = pd.DataFrame(data, columns = ['ID', 'Name'])
# Initialize the section grades
data = [['ab10', 95], ['ca9', 72]]
grades0 = pd.DataFrame(data, columns = ['ID', 'Exp1'])
data = [['ab10', 83], ['ca9', 97]]
grades1 = pd.DataFrame(data, columns = ['ID', 'Exp2'])
data = [['jb19', 61], ['cf25', 95]]
grades2 = pd.DataFrame(data, columns = ['ID', 'Exp1'])
# Now merge the section grades with the roster to generate final gradebook
roster = roster.merge(grades0, on = 'ID', how = 'outer')
roster = roster.merge(grades1, on = 'ID', how = 'outer')
roster = roster.merge(grades2, on = 'ID', how = 'outer')
print(roster)
This code generates the following:
ID Name Exp1_x Exp2 Exp1_y
0 ab10 Ann Big 95.0 83.0 NaN
1 ca9 Carl Ahn 72.0 97.0 NaN
2 jb19 John Brown NaN NaN 61.0
3 cf25 Carol Fox NaN NaN 95.0
I don't want the duplicated Exp1 columns with the suffixes _x and _y. Instead I want:
ID Name Exp1 Exp2
0 ab10 Ann Big 95.0 83.0
1 ca9 Carl Ahn 72.0 97.0
2 jb19 John Brown 61.0 NaN
3 cf25 Carol Fox 95.0 NaN
There should be no duplicated data between the grade dataframes (but it would be good practice to raise an error were an overlap to exist).