0

This is a follow-up question to this post: Pandas setting a value depending on date ranges on another dataframe

If there are rows in the transactions dataframe that don't have a matching agentname in the rate dataframe, how can we still keep those rows but put in a null/na value for the agentname_rates column?

Rates table

   Agentname   ProductType     OldRate NewRate StartDate   EndDate
   0   VSFAAL      SPORTS       0.0    10.0    2020-11-05  2021-01-18
   1   VSFAAL      APPAREL      0.0    35.0    2020-11-05  2022-05-03
   2   VSFAAL      SPORTS      10.0    15.0    2021-01-18  2022-05-03
   3   VSFAALJS    SPORTS       0.0    10.0    2020-11-07  2022-05-03
   4   VSFAALJS    APPAREL      0.0    15.0    2020-11-07  2021-11-09
   5   VSFAALJS    APPAREL     15.0     5.0    2021-11-09  2022-05-03

Transactions table

   Date                         Sales   Agentname   ProductType     
   0 2020-12-01 08:00:02        100.0  VSFAAL      SPORTS       
   1 2022-03-01 08:00:09         99.0  VSFAAL      APPAREL      
   2 2022-03-01 08:00:14         75.0  VSFAAL      SPORTS       
   3 2021-05-01 08:00:39         67.0  VSFAALJS    SPORTS 
   4 2020-05-01 08:00:56         160.0 VSFAALJS    APPAREL           
   5 2021-05-01 08:00:56         65.0  VSFAALJS    APPAREL 
   6 2021-06-03 09:07:33         55.0  VSRANDOM    SPORTS  

Desired Output

              Date              Sales   Agentname   ProductType     Agentname_rates
   0 2020-12-01 08:00:02        100.0  VSFAAL      SPORTS             10.0
   1 2022-03-01 08:00:09         99.0  VSFAAL      APPAREL            35.0
   2 2022-03-01 08:00:14         75.0  VSFAAL      SPORTS             15.0
   3 2021-05-01 08:00:39         67.0  VSFAALJS    SPORTS             10.0
   4 2020-05-01 08:00:56         160.0 VSFAALJS    APPAREL              NULL
   5 2021-05-01 08:00:56         65.0  VSFAALJS    APPAREL            15.0
   6 2021-06-03 09:07:33         55.0  VSRANDOM    SPORTS             NULL

The following code merges the two tables but does not retain have the two rows with null that I want to keep.

df3=df2.merge(df[['StartDate', 'EndDate','NewRate']], 
         left_on =[df2['Agentname'], df2['ProductType']],
         right_on=[df['Agentname'],  df['ProductType']],
              how='left',
          suffixes=('','_start')
        ).drop(columns=['key_0', 'key_1' ])

df3[df3['Date'].astype('datetime64').dt.strftime('%Y-%m-%d').between(
                                      df3['StartDate'].astype('datetime64'),
                                      df3['EndDate'].astype('datetime64'))
   ]

Thanks!

1 Answer 1

0

You can use left join with remove StartDate/EndDate columns:

df3 = df2.merge(df3.drop(['StartDate','EndDate'], axis=1), how='left')
print (df3)
                  Date  Sales Agentname ProductType  NewRate
0  2020-12-01 08:00:02  100.0    VSFAAL      SPORTS     10.0
1  2022-03-01 08:00:09   99.0    VSFAAL     APPAREL     35.0
2  2022-03-01 08:00:14   75.0    VSFAAL      SPORTS     15.0
3  2021-05-01 08:00:39   67.0  VSFAALJS      SPORTS     10.0
4  2020-05-01 08:00:56  160.0  VSFAALJS     APPAREL      NaN
5  2021-05-01 08:00:56   65.0  VSFAALJS     APPAREL     15.0
6  2021-06-03 09:07:33   55.0  VSRANDOM      SPORTS      NaN
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.