In order to merge two dataframes based on year and city, I want to fill missing values in df1 gdp_value and growth_rate from the values in gdp and rate respectively from df2.
df1
year city gdp_value growth_rate
0 2015 sh NaN NaN
1 2016 sh NaN NaN
2 2017 sh NaN NaN
3 2018 sh NaN NaN
4 2019 sh NaN NaN
5 2015 bj 7.0 0.01
6 2016 bj 3.0 0.03
7 2017 bj 2.0 -0.03
8 2018 bj 5.0 0.05
9 2019 bj 4.0 0.02
df2
year city gdp rate
0 2015 sh 6 0.04
1 2016 sh 5 0.07
2 2017 sh 3 -0.03
3 2018 sh 6 0.05
4 2019 sh 4 0.02
I have tried with pd.merge(df1, df2, on=['year', 'city'], how = 'left') and I got:
year city gdp_value growth_rate gdp rate
0 2015 sh NaN NaN 6.0 0.04
1 2016 sh NaN NaN 5.0 0.07
2 2017 sh NaN NaN 3.0 -0.03
3 2018 sh NaN NaN 6.0 0.05
4 2019 sh NaN NaN 4.0 0.02
5 2015 bj 7.0 0.01 NaN NaN
6 2016 bj 3.0 0.03 NaN NaN
7 2017 bj 2.0 -0.03 NaN NaN
8 2018 bj 5.0 0.05 NaN NaN
9 2019 bj 4.0 0.02 NaN NaN
My desired output df is like this:
year city gdp_value ratio_rate
0 2015 sh 6 0.04
1 2016 sh 5 0.07
2 2017 sh 3 -0.03
3 2018 sh 6 0.05
4 2019 sh 4 0.02
5 2015 bj 7 0.01
6 2016 bj 3 0.03
7 2017 bj 2 -0.03
8 2018 bj 5 0.05
9 2019 bj 4 0.02
Thanks for your help.
Edited, this solution seems works out, thanks:
df1 = df1.set_index(['year', 'city'])
df1.update(
df2
.set_index(['year', 'city'])\
.rename(columns={'gdp':'gdp_value','rate':'growth_rate'})\
)
df1 = df1.reset_index()