0

Input

Assume I have a dataframe with the following structure:

   transaction_code   transaction_time   amount reversed_transaction_code
0       TX051       2019-01-01 13:00:00   150   
1       TX002       2019-01-01 14:00:00   250           TX004
2       TX113       2019-01-01 15:00:00   100   
3       TX004       2019-01-01 16:00:00    80           TX002
4       TX805       2019-01-01 17:00:00    30

This can be reproduced using the following code:

eg = {'transaction_code': ['TX051','TX002','TX113','TX004','TX805'],
      'transaction_time': pd.to_datetime(['1 Jan 2019 1pm','1 Jan 2019 2pm','1 Jan 2019 3pm','1 Jan 2019 4pm','1 Jan 2019 5pm']),
      'amount': [150,250,100,80,30],
      'reversed_transaction_code': ['','TX004','','TX002','']}
df = pd.DataFrame(eg)

In df, each row corresponds to a transaction made at my shop. When items are returned, a new transaction is added and logged in the reversed_transaction_code column.

Problem

For example, $80 of items from TX002 were returned in TX004. How do I match these transactions, record the time and amount of returns and then remove the ROWS which are reversed transactions?

Expected output

The new columns should look like this:

   reversed_amount  reversed_transaction_time
0        NaN                   NaT
1        80            2019-01-01 16:00:00
2        NaN                   NaT
4        NaN                   NaT

This can be reproduced using the following code:

da = df[df.index!=3]
da['reversed_amount'] = [None, 80, None, None]
da['reversed_transaction_time'] =  pd.to_datetime([None, '1 Jan 2019 4pm', None, None])
7
  • do you not want the $250 in reversed column as well? if not how are you grouping them for future entries. Commented Feb 25, 2019 at 4:01
  • 1
    @anky_91 I believe $250 is not needed as it is not a reversed amount. Only $80 is. Commented Feb 25, 2019 at 4:05
  • @meW i am confused with the identifier here. :( Commented Feb 25, 2019 at 4:06
  • 2
    @anky_91 Let me try this. Jezrael and other grand masters seems offline right now hehe Commented Feb 25, 2019 at 4:11
  • do you want this to happen whenever a new reversed transaction happens? Commented Feb 25, 2019 at 4:30

1 Answer 1

1

I've modified your original data to make it little more complex.


Solution-

eg = {'transaction_code': ['TX051','TX002','TX113','TX004','TX805'],
          'transaction_time': pd.to_datetime(['1 Jan 2019 1pm','1 Jan 2019 2pm','1 Jan 2019 3pm','1 Jan 2019 4pm','1 Jan 2019 5pm']),
          'amount': [150,250,100,80,30],
          'reversed_transaction_code': ['','TX004','TX805','TX002','TX113']}
df = pd.DataFrame(eg)
df


+---+--------+---------------------------+------------------+---------------------+
|   | amount | reversed_transaction_code | transaction_code |  transaction_time   |
+---+--------+---------------------------+------------------+---------------------+
| 0 |    150 |                           | TX051            | 2019-01-01 13:00:00 |
| 1 |    250 | TX004                     | TX002            | 2019-01-01 14:00:00 |
| 2 |    100 | TX805                     | TX113            | 2019-01-01 15:00:00 |
| 3 |     80 | TX002                     | TX004            | 2019-01-01 16:00:00 |
| 4 |     30 | TX113                     | TX805            | 2019-01-01 17:00:00 |
+---+--------+---------------------------+------------------+---------------------+

# Fetching the index where there's an entry on reversed_transaction_code
idx_ = df[df.reversed_transaction_code.str.startswith('T')].index
idx_
# Int64Index([1, 2], dtype='int64')

# Creating blank columns
df['reversed_amount'] = np.NaN
df['reversed_transaction_time'] = None

# Reverse transaction index
idxR_ = df.iloc[idx_, :][df.loc[idx_, 'reversed_transaction_code'].str.split('TX', expand=True).iloc[:, 1] < df.loc[idx_, 'transaction_code'].str.split('TX', expand=True).iloc[:, 1]].index
idxR_
# Int64Index([3, 4], dtype='int64')

# Fetching valid reversed transaction code from reversed_transaction_code column
val = df.loc[idxR_, 'reversed_transaction_code'] 
val
# 3    TX002
# 4    TX113
# Name: reversed_transaction_code, dtype: object

# Fetching transaction code from transaction_code column  
code_idx_ = df[np.where(df.transaction_code.isin(val), True , False)].index
code_idx_
# Int64Index([1, 2], dtype='int64')

# checking where does transaction code lies and adding corresponding results to new columns
# Below code can be made shorter or more efficient (say using merge/join, etc)
for i in range(len(val)):
    for j in range(len(code_idx_)):        
        if val.iloc[i] == df.loc[code_idx_[j], 'transaction_code']: 
            df.loc[code_idx_[j], 'reversed_transaction_time'] = df.loc[val.index[i], 'transaction_time']
            df.loc[code_idx_[j], 'reversed_amount'] = df.loc[val.index[i], 'amount']

# Removing the rows with reversed transactions
df.drop(val.index, inplace=True)
df            

+---+--------+---------------------------+------------------+---------------------+-----------------+---------------------------+
|   | amount | reversed_transaction_code | transaction_code |  transaction_time   | reversed_amount | reversed_transaction_time |
+---+--------+---------------------------+------------------+---------------------+-----------------+---------------------------+
| 0 |    150 |                           | TX051            | 2019-01-01 13:00:00 | NaN             | None                      |
| 1 |    250 | TX004                     | TX002            | 2019-01-01 14:00:00 | 80.0            | 2019-01-01 16:00:00       |
| 2 |    100 | TX805                     | TX113            | 2019-01-01 15:00:00 | 30.0            | 2019-01-01 17:00:00       |
+---+--------+---------------------------+------------------+---------------------+-----------------+---------------------------+
Sign up to request clarification or add additional context in comments.

6 Comments

hmm interesting..!!
@meW Thank you for your help! This is very close to the result I was hoping for and I have been trying to modify it to get there by using the transaction times instead of codes (which I should have stated are not sequential). However, I haven't been able to do it. Do you have any suggestions? Also, I intended to drop the rows which are the return transactions, I have edited the post accordingly
@brandoldperson In the middle of something. Will check it out later.
@meW No problem, take your time. Thanks
@brandoldperson I've modified the answer as per the question. Have a look and let me know if it suits your needs.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.