Matching rows in pandas based on values is different columns

Question

Input

Assume I have a dataframe with the following structure:

   transaction_code   transaction_time   amount reversed_transaction_code
0       TX051       2019-01-01 13:00:00   150   
1       TX002       2019-01-01 14:00:00   250           TX004
2       TX113       2019-01-01 15:00:00   100   
3       TX004       2019-01-01 16:00:00    80           TX002
4       TX805       2019-01-01 17:00:00    30

This can be reproduced using the following code:

eg = {'transaction_code': ['TX051','TX002','TX113','TX004','TX805'],
      'transaction_time': pd.to_datetime(['1 Jan 2019 1pm','1 Jan 2019 2pm','1 Jan 2019 3pm','1 Jan 2019 4pm','1 Jan 2019 5pm']),
      'amount': [150,250,100,80,30],
      'reversed_transaction_code': ['','TX004','','TX002','']}
df = pd.DataFrame(eg)

In df, each row corresponds to a transaction made at my shop. When items are returned, a new transaction is added and logged in the reversed_transaction_code column.

Problem

For example, $80 of items from TX002 were returned in TX004. How do I match these transactions, record the time and amount of returns and then remove the ROWS which are reversed transactions?

Expected output

The new columns should look like this:

   reversed_amount  reversed_transaction_time
0        NaN                   NaT
1        80            2019-01-01 16:00:00
2        NaN                   NaT
4        NaN                   NaT

This can be reproduced using the following code:

da = df[df.index!=3]
da['reversed_amount'] = [None, 80, None, None]
da['reversed_transaction_time'] =  pd.to_datetime([None, '1 Jan 2019 4pm', None, None])

do you not want the $250 in reversed column as well? if not how are you grouping them for future entries. — anky
– anky, Commented Feb 25, 2019 at 4:01
@anky_91 I believe $250 is not needed as it is not a reversed amount. Only $80 is. — meW
– meW, Commented Feb 25, 2019 at 4:05
@anky_91 Let me try this. Jezrael and other grand masters seems offline right now hehe — meW
– meW, Commented Feb 25, 2019 at 4:11
do you want this to happen whenever a new reversed transaction happens? — Jeril
– Jeril, Commented Feb 25, 2019 at 4:30

meW · Accepted Answer · 2019-02-25 08:04:37Z

1

I've modified your original data to make it little more complex.

Solution-

eg = {'transaction_code': ['TX051','TX002','TX113','TX004','TX805'],
          'transaction_time': pd.to_datetime(['1 Jan 2019 1pm','1 Jan 2019 2pm','1 Jan 2019 3pm','1 Jan 2019 4pm','1 Jan 2019 5pm']),
          'amount': [150,250,100,80,30],
          'reversed_transaction_code': ['','TX004','TX805','TX002','TX113']}
df = pd.DataFrame(eg)
df


+---+--------+---------------------------+------------------+---------------------+
|   | amount | reversed_transaction_code | transaction_code |  transaction_time   |
+---+--------+---------------------------+------------------+---------------------+
| 0 |    150 |                           | TX051            | 2019-01-01 13:00:00 |
| 1 |    250 | TX004                     | TX002            | 2019-01-01 14:00:00 |
| 2 |    100 | TX805                     | TX113            | 2019-01-01 15:00:00 |
| 3 |     80 | TX002                     | TX004            | 2019-01-01 16:00:00 |
| 4 |     30 | TX113                     | TX805            | 2019-01-01 17:00:00 |
+---+--------+---------------------------+------------------+---------------------+

# Fetching the index where there's an entry on reversed_transaction_code
idx_ = df[df.reversed_transaction_code.str.startswith('T')].index
idx_
# Int64Index([1, 2], dtype='int64')

# Creating blank columns
df['reversed_amount'] = np.NaN
df['reversed_transaction_time'] = None

# Reverse transaction index
idxR_ = df.iloc[idx_, :][df.loc[idx_, 'reversed_transaction_code'].str.split('TX', expand=True).iloc[:, 1] < df.loc[idx_, 'transaction_code'].str.split('TX', expand=True).iloc[:, 1]].index
idxR_
# Int64Index([3, 4], dtype='int64')

# Fetching valid reversed transaction code from reversed_transaction_code column
val = df.loc[idxR_, 'reversed_transaction_code'] 
val
# 3    TX002
# 4    TX113
# Name: reversed_transaction_code, dtype: object

# Fetching transaction code from transaction_code column  
code_idx_ = df[np.where(df.transaction_code.isin(val), True , False)].index
code_idx_
# Int64Index([1, 2], dtype='int64')

# checking where does transaction code lies and adding corresponding results to new columns
# Below code can be made shorter or more efficient (say using merge/join, etc)
for i in range(len(val)):
    for j in range(len(code_idx_)):        
        if val.iloc[i] == df.loc[code_idx_[j], 'transaction_code']: 
            df.loc[code_idx_[j], 'reversed_transaction_time'] = df.loc[val.index[i], 'transaction_time']
            df.loc[code_idx_[j], 'reversed_amount'] = df.loc[val.index[i], 'amount']

# Removing the rows with reversed transactions
df.drop(val.index, inplace=True)
df            

+---+--------+---------------------------+------------------+---------------------+-----------------+---------------------------+
|   | amount | reversed_transaction_code | transaction_code |  transaction_time   | reversed_amount | reversed_transaction_time |
+---+--------+---------------------------+------------------+---------------------+-----------------+---------------------------+
| 0 |    150 |                           | TX051            | 2019-01-01 13:00:00 | NaN             | None                      |
| 1 |    250 | TX004                     | TX002            | 2019-01-01 14:00:00 | 80.0            | 2019-01-01 16:00:00       |
| 2 |    100 | TX805                     | TX113            | 2019-01-01 15:00:00 | 30.0            | 2019-01-01 17:00:00       |
+---+--------+---------------------------+------------------+---------------------+-----------------+---------------------------+

edited Feb 25, 2019 at 8:04

answered Feb 25, 2019 at 4:23

meW

3,97710 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

anky Over a year ago

hmm interesting..!!

brandoldperson Over a year ago

@meW Thank you for your help! This is very close to the result I was hoping for and I have been trying to modify it to get there by using the transaction times instead of codes (which I should have stated are not sequential). However, I haven't been able to do it. Do you have any suggestions? Also, I intended to drop the rows which are the return transactions, I have edited the post accordingly

meW Over a year ago

@brandoldperson In the middle of something. Will check it out later.

brandoldperson Over a year ago

@meW No problem, take your time. Thanks

meW Over a year ago

@brandoldperson I've modified the answer as per the question. Have a look and let me know if it suits your needs.

|

Collectives™ on Stack Overflow

Matching rows in pandas based on values is different columns

1 Answer 1

Solution-

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Solution-

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related