I have two dataframes, df1 and df2, and I would like to substruct the df2 from df1 and using as a row comparison a specific column, 'Code'
import pandas as pd
import numpy as np
rng = pd.date_range('2021-01-01', periods=10, freq='D')
df1 = pd.DataFrame(index=rng, data={'Val1': range(10), 'Val2': np.array(range(10))*5, 'Code': [1, 1, 1, 2, 2, 2, 3, 3, 3, 3]})
df2 = pd.DataFrame(data={'Code': [1, 2, 3, 4], 'Val1': [10, 5, 15, 20], 'Val2': [4, 8, 10, 7]})
df1:
Val1 Val2 Code
2021-01-01 0 0 1
2021-01-02 1 5 1
2021-01-03 2 10 1
2021-01-04 3 15 2
2021-01-05 4 20 2
2021-01-06 5 25 2
2021-01-07 6 30 3
2021-01-08 7 35 3
2021-01-09 8 40 3
2021-01-10 9 45 3
df2:
Code Val1 Val2
0 1 10 4
1 2 5 8
2 3 15 10
3 4 20 7
I using the following code:
df = (df1.set_index(['Code']) - df2.set_index(['Code']))
and the result is
Code
1 -10.0 -4.0
1 -9.0 1.0
1 -8.0 6.0
2 -2.0 7.0
2 -1.0 12.0
2 0.0 17.0
3 -9.0 20.0
3 -8.0 25.0
3 -7.0 30.0
3 -6.0 35.0
4 NaN NaN
However, I only want to get the results for the rows that are in df1 and not the missing keys, in this example the 4.
How do I do it and then to set back the index to the original from df1?
Something like that but it doesn't work:
df = (df1.set_index(['Code']) - df2.set_index(['Code'])).set_index(df1['Code'])
Also I would like to keep the headers of the columns.
Desired output:
Val1 Val2 Code
Date
2021-01-01 -10.0 -4.0 1
2021-01-02 -9.0 1.0 1
2021-01-03 -8.0 6.0 1
2021-01-04 -2.0 7.0 2
2021-01-05 -1.0 12.0 2
2021-01-06 0.0 17.0 2
2021-01-07 -9.0 20.0 3
2021-01-08 -8.0 25.0 3
2021-01-09 -7.0 30.0 3
2021-01-10 -6.0 35.0 3