1

I have a dataframe with 2 columns: 'VENDOR_ID' and 'GL_Transaction_Description'. I want to print every row of the 'GL_Transaction_Description' column that has any value from the 'VENDOR_ID' column.

VENDOR_ID GL_Transaction_Description
123 HELLO 345
456 BYE 456
987 THANKS 456

The desired output here would be 'BYE 456' AND 'THANKS 456. My code is as such:

for k in range(len(df)):
    for j in range(len(df)):
        if df['VENDOR_ID'][k] in df['GL_Transaction_Description'][j] and df['VENDOR_ID'][k] != 'nan':
            print(df['GL_Transaction_Description'][j])

But this particular dataframe counts more than 100k rows and it takes forever to run with a nested for loop. Any ideas on how to make this execute faster? I have read that using numpy usually makes things go blazingly faster but I haven't been able to implement it.

5
  • 1
    I don't understand what you're trying to accomplish. Can you posta sample data and expected result? Commented Mar 22, 2021 at 14:42
  • I don't know what you're trying to accomplish either, but OP should look for boolean masks (as in ashkangh awnser) and the apply method. Commented Mar 22, 2021 at 14:45
  • @CFreitas Sorry about that, I edited it Commented Mar 22, 2021 at 15:06
  • since you are using two loops can I assume that if you have a third row: ID=789, GL="Hello 456", this value should also return? Commented Mar 22, 2021 at 15:17
  • @CFreitas yes, exactly Commented Mar 22, 2021 at 15:20

3 Answers 3

1

Use Boolean Mask

v_list = df['VENDOR_ID'].to_list()
mask = list(map((lambda x: any([(y in x) for y in v_list])), df['GL_Transaction_Description']))

print(df['GL_Transaction_Description'][mask])

Assumed 'VENDOR_ID' is already in dtype of str. If not, then change the line mask = .... to:

mask = list(map((lambda x: any([(str(y) in x) for y in v_list])), df['GL_Transaction_Description']))

We can do it with df.apply() with axis=1. However, list(map()) has better system performance (execution time) than df.apply() on axis=1.

Output:

1       BYE 456
2    THANKS 456
Name: GL_Transaction_Description, dtype: object
Sign up to request clarification or add additional context in comments.

6 Comments

Nice, but this only works for the matching elements in the same row. I want matching elements that exist in the whole of the VENDOR_ID column. (check my re-edited question)
@JohnMantios Amended as per your clarification.
Thanks a lot!What if I wanted to retrieve the indexes of the matching elements?
@JohnMantios I have added link for reference of the better system performance of using list(map()) as compared to apply() on axis=1. It could be around 3x ~ 4x times faster in some cases.
@JohnMantios If you just want the index, use df['GL_Transaction_Description'][mask].index
|
0

You can use boolean Indexing

df.loc[df['GL_Transaction_Description'].isin(df['VENDOR_ID']), 'GL_Transaction_Description']

Comments

0

You can use boolean indexing with the isin function

import pandas as pd
df = pd.DataFrame({'VENDOR_ID': list('abcde') + ['matching_item'],
                  'GL_Transaction_Description': ['trx_descr_' + c for c in list('abcde')] + ['matching_item']})
df

    VENDOR_ID       GL_Transaction_Description
0   a               trx_descr_a
1   b               trx_descr_b
2   c               trx_descr_c
3   d               trx_descr_d
4   e               trx_descr_e
5   matching_item   matching_item

df[df.GL_Transaction_Description.isin(df.VENDOR_ID)].GL_Transaction_Description

5    matching_item
Name: GL_Transaction_Description, dtype: object

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.