Optimizing a nested loop in pandas

Question

I have a dataframe with 2 columns: 'VENDOR_ID' and 'GL_Transaction_Description'. I want to print every row of the 'GL_Transaction_Description' column that has any value from the 'VENDOR_ID' column.

VENDOR_ID	GL_Transaction_Description
123	HELLO 345
456	BYE 456
987	THANKS 456

The desired output here would be 'BYE 456' AND 'THANKS 456. My code is as such:

for k in range(len(df)):
    for j in range(len(df)):
        if df['VENDOR_ID'][k] in df['GL_Transaction_Description'][j] and df['VENDOR_ID'][k] != 'nan':
            print(df['GL_Transaction_Description'][j])

But this particular dataframe counts more than 100k rows and it takes forever to run with a nested for loop. Any ideas on how to make this execute faster? I have read that using numpy usually makes things go blazingly faster but I haven't been able to implement it.

I don't understand what you're trying to accomplish. Can you posta sample data and expected result? — CFreitas
– CFreitas, Commented Mar 22, 2021 at 14:42
I don't know what you're trying to accomplish either, but OP should look for boolean masks (as in ashkangh awnser) and the apply method. — Felício
– Felício, Commented Mar 22, 2021 at 14:45
since you are using two loops can I assume that if you have a third row: ID=789, GL="Hello 456", this value should also return? — CFreitas
– CFreitas, Commented Mar 22, 2021 at 15:17

SeaBean · Accepted Answer · 2021-03-22 16:01:15Z

1

Use Boolean Mask

v_list = df['VENDOR_ID'].to_list()
mask = list(map((lambda x: any([(y in x) for y in v_list])), df['GL_Transaction_Description']))

print(df['GL_Transaction_Description'][mask])

Assumed 'VENDOR_ID' is already in dtype of str. If not, then change the line mask = .... to:

mask = list(map((lambda x: any([(str(y) in x) for y in v_list])), df['GL_Transaction_Description']))

We can do it with df.apply() with axis=1. However, list(map()) has better system performance (execution time) than df.apply() on axis=1.

Output:

1       BYE 456
2    THANKS 456
Name: GL_Transaction_Description, dtype: object

edited Mar 22, 2021 at 16:01

answered Mar 22, 2021 at 15:13

SeaBean

23.4k3 gold badges16 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

John Mantios Over a year ago

Nice, but this only works for the matching elements in the same row. I want matching elements that exist in the whole of the VENDOR_ID column. (check my re-edited question)

SeaBean Over a year ago

@JohnMantios Amended as per your clarification.

John Mantios Over a year ago

Thanks a lot!What if I wanted to retrieve the indexes of the matching elements?

SeaBean Over a year ago

@JohnMantios I have added link for reference of the better system performance of using list(map()) as compared to apply() on axis=1. It could be around 3x ~ 4x times faster in some cases.

SeaBean Over a year ago

@JohnMantios If you just want the index, use df['GL_Transaction_Description'][mask].index

|

ashkangh · Accepted Answer · 2021-03-22 14:44:30Z

0

You can use boolean Indexing

df.loc[df['GL_Transaction_Description'].isin(df['VENDOR_ID']), 'GL_Transaction_Description']

answered Mar 22, 2021 at 14:44

ashkangh

1,6241 gold badge8 silver badges11 bronze badges

Comments

Giuseppe Marco Boscardin · Accepted Answer · 2021-03-22 14:49:42Z

0

You can use boolean indexing with the isin function

import pandas as pd
df = pd.DataFrame({'VENDOR_ID': list('abcde') + ['matching_item'],
                  'GL_Transaction_Description': ['trx_descr_' + c for c in list('abcde')] + ['matching_item']})
df

    VENDOR_ID       GL_Transaction_Description
0   a               trx_descr_a
1   b               trx_descr_b
2   c               trx_descr_c
3   d               trx_descr_d
4   e               trx_descr_e
5   matching_item   matching_item

df[df.GL_Transaction_Description.isin(df.VENDOR_ID)].GL_Transaction_Description

5    matching_item
Name: GL_Transaction_Description, dtype: object

answered Mar 22, 2021 at 14:49

Giuseppe Marco Boscardin

3801 gold badge3 silver badges13 bronze badges

Collectives™ on Stack Overflow

Optimizing a nested loop in pandas

3 Answers 3

6 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related