Input
Assume I have a dataframe with the following structure:
transaction_code transaction_time amount reversed_transaction_code
0 TX051 2019-01-01 13:00:00 150
1 TX002 2019-01-01 14:00:00 250 TX004
2 TX113 2019-01-01 15:00:00 100
3 TX004 2019-01-01 16:00:00 80 TX002
4 TX805 2019-01-01 17:00:00 30
This can be reproduced using the following code:
eg = {'transaction_code': ['TX051','TX002','TX113','TX004','TX805'],
'transaction_time': pd.to_datetime(['1 Jan 2019 1pm','1 Jan 2019 2pm','1 Jan 2019 3pm','1 Jan 2019 4pm','1 Jan 2019 5pm']),
'amount': [150,250,100,80,30],
'reversed_transaction_code': ['','TX004','','TX002','']}
df = pd.DataFrame(eg)
In df, each row corresponds to a transaction made at my shop. When items are returned, a new transaction is added and logged in the reversed_transaction_code column.
Problem
For example, $80 of items from TX002 were returned in TX004. How do I match these transactions, record the time and amount of returns and then remove the ROWS which are reversed transactions?
Expected output
The new columns should look like this:
reversed_amount reversed_transaction_time
0 NaN NaT
1 80 2019-01-01 16:00:00
2 NaN NaT
4 NaN NaT
This can be reproduced using the following code:
da = df[df.index!=3]
da['reversed_amount'] = [None, 80, None, None]
da['reversed_transaction_time'] = pd.to_datetime([None, '1 Jan 2019 4pm', None, None])