0

I have two dataframes, df1 and df2, and I would like to substruct the df2 from df1 and using as a row comparison a specific column, 'Code'

import pandas as pd
import numpy as np
rng = pd.date_range('2021-01-01', periods=10, freq='D')
df1 = pd.DataFrame(index=rng, data={'Val1': range(10), 'Val2': np.array(range(10))*5, 'Code': [1, 1, 1, 2, 2, 2, 3, 3, 3, 3]})

df2 = pd.DataFrame(data={'Code': [1, 2, 3, 4], 'Val1': [10, 5, 15, 20], 'Val2': [4, 8, 10, 7]})

df1:

            Val1  Val2  Code
2021-01-01     0     0     1
2021-01-02     1     5     1
2021-01-03     2    10     1
2021-01-04     3    15     2
2021-01-05     4    20     2
2021-01-06     5    25     2
2021-01-07     6    30     3
2021-01-08     7    35     3
2021-01-09     8    40     3
2021-01-10     9    45     3

df2:

   Code  Val1  Val2
0     1    10     4
1     2     5     8
2     3    15    10
3     4    20     7

I using the following code:

df = (df1.set_index(['Code']) - df2.set_index(['Code']))

and the result is

Code            
1    -10.0  -4.0
1     -9.0   1.0
1     -8.0   6.0
2     -2.0   7.0
2     -1.0  12.0
2      0.0  17.0
3     -9.0  20.0
3     -8.0  25.0
3     -7.0  30.0
3     -6.0  35.0
4      NaN   NaN

However, I only want to get the results for the rows that are in df1 and not the missing keys, in this example the 4.

How do I do it and then to set back the index to the original from df1?

Something like that but it doesn't work:

df = (df1.set_index(['Code']) - df2.set_index(['Code'])).set_index(df1['Code'])

Also I would like to keep the headers of the columns.

Desired output:

            Val1  Val2  Code
Date                        
2021-01-01 -10.0  -4.0     1
2021-01-02  -9.0   1.0     1
2021-01-03  -8.0   6.0     1
2021-01-04  -2.0   7.0     2
2021-01-05  -1.0  12.0     2
2021-01-06   0.0  17.0     2
2021-01-07  -9.0  20.0     3
2021-01-08  -8.0  25.0     3
2021-01-09  -7.0  30.0     3
2021-01-10  -6.0  35.0     3
1
  • Can you add your desired outcome please? It will make it simpler for us to get you what you need. Commented Feb 24, 2021 at 9:04

2 Answers 2

1

If you want to get the results for the rows that are in df1 and not the missing keys, in this example the 4 then just use drop_na() method

df = (df1.set_index(['Code']) - df2.set_index(['Code'])).dropna()

then:-

df.insert(0,'Date',df1.index)

And Finally:-

df.reset_index(inplace=True)
df.set_index('Date',inplace=True)

Now if you print df you will get your desired output:-

           Code  Val1   Val2
Date            
2021-01-01  1   -10.0   -4.0
2021-01-02  1   -9.0    1.0
2021-01-03  1   -8.0    6.0
2021-01-04  2   -2.0    7.0
2021-01-05  2   -1.0    12.0
2021-01-06  2   0.0     17.0
2021-01-07  3   -9.0    20.0
2021-01-08  3   -8.0    25.0
2021-01-09  3   -7.0    30.0
2021-01-10  3   -6.0    35.0

Note:-In case this is not your desired output then let me know

Sign up to request clarification or add additional context in comments.

Comments

1

You can use reindex to align df2 to df1["code"]. Then we can take the underlying numpy ndarray and subtract that inplace from the corresponding columns df1. This will leave both the index and the "code" column untouched and perform subtraction as expected.

subtract_values = df2.set_index("Code").reindex(df1["Code"]).to_numpy()
df1[["Val1", "Val2"]] -= subtract_values

print(df1)
            Val1  Val2  Code
2021-01-01   -10    -4     1
2021-01-02    -9     1     1
2021-01-03    -8     6     1
2021-01-04    -2     7     2
2021-01-05    -1    12     2
2021-01-06     0    17     2
2021-01-07    -9    20     3
2021-01-08    -8    25     3
2021-01-09    -7    30     3
2021-01-10    -6    35     3

If you don't want to change df1, you can copy the data to a new DataFrame via new_df = df1.copy() and proceeding with new_df instead of df1

1 Comment

I would prefer to independent on the column names, ie not to specify the val1, val2, val3 etc.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.