I have two dataframes with identical columns, but different values and different number of rows.
import pandas as pd
data1 = {'Region': ['Africa','Africa','Africa','Africa','Africa','Africa','Africa','Africa','Asia','Asia','Asia','Asia'],
'Country': ['South Africa','South Africa','South Africa','South Africa','South Africa','South Africa','South Africa','South Africa','Japan','Japan','Japan','Japan'],
'Product': ['ABC','ABC','ABC','ABC','XYZ','XYZ','XYZ','XYZ','DEF','DEF','DEF','DEF'],
'Year': [2016, 2017, 2018, 2019,2016, 2017, 2018, 2019,2016, 2017, 2018, 2019],
'Price': [500, 400, 0,450,750,0,0,890,500,470,0,415]}
data1 = {'Region': ['Africa','Africa','Africa','Africa','Africa','Africa','Asia','Asia'],
'Country': ['South Africa','South Africa','South Africa','South Africa','South Africa','South Africa','Japan','Japan'],
'Product': ['ABC','ABC','ABC','ABC','XYZ','XYZ','DEF','DEF'],
'Year': [2016, 2017, 2018, 2019,2016, 2017,2016, 2017],
'Price': [200, 100, 30,750,350,120,400,370]}
df = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df is the complete dataset but with some old values, whereas df2 only has the updated values. I want to replace all the values that are in df with the values in df2, all while keeping the values from df that aren't in df2.
So for example, in df, the value for Country = Japan, for Product = DEF, in Year = 2016, the Price should be updated from 470 to 400. The same for 2017, while 2018 and 2019 stay the same.
So far I have the following code that doesn't seem to work:
common_index = ['Region','Country','Product','Year']
df = df.set_index(common_index)
df2 = df2.set_index(common_index)
df.update(df2, overwrite = True)
But this only updates df with the values from df2 and deletes everything else.
Expected output should look like this:
data3 = {'Region': ['Africa','Africa','Africa','Africa','Africa','Africa','Africa','Africa','Asia','Asia','Asia','Asia'],
'Country': ['South Africa','South Africa','South Africa','South Africa','South Africa','South Africa','South Africa','South Africa','Japan','Japan','Japan','Japan'],
'Product': ['ABC','ABC','ABC','ABC','XYZ','XYZ','XYZ','XYZ','DEF','DEF','DEF','DEF'],
'Year': [2016, 2017, 2018, 2019,2016, 2017, 2018, 2019,2016, 2017, 2018, 2019],
'Price': [200, 100, 30,750,350,120,0,890,400,370,0,415]}
df3 = pd.DataFrame(data3)
Any suggestions on how I can do this?