How to speed up nested for loop with dataframe?

Question

I have a dataframe like this:

test = pd.DataFrame({'id':['a','C','D','b','b','D','c','c','c'], 'text':['a','x','a','b','b','b','c','c','c']})

Using the following for-loop I can add x to a new_col. This for-loop works fine for the small dataframe. However, for dataframes that have thousands of rows, it will take many hours to process. Any suggestions to speed it up?

for index, row in test.iterrows():
    if row['id'] == 'C':
        if test['id'][index+1] =='D':
            test['new_col'][index+1] = test['text'][index]

It is always a bad idea to iterate through a pandas DataFrame. Their documentation even warns of this. You would be happier if you pre-processed the data before converting it to a DataFrame. — Tim Roberts
– Tim Roberts, Commented Aug 12, 2021 at 5:26
98% of the times for-loop is bad and not to be used with Pandas. Avoid it at all costs! ;) — Prayson W. Daniel
– Prayson W. Daniel, Commented Aug 12, 2021 at 17:33

Vikas Periyadath · Accepted Answer · 2021-08-13 05:05:35Z

2

Try using shift() and conditions.

import pandas as pd
import numpy as np

df = pd.DataFrame({'id': ['a', 'C', 'D', 'b', 'b', 'D', 'c', 'c', 'c'], 
                   'text': ['a', 'x', 'a', 'b', 'b', 'b', 'c', 'c', 'c']})


df['temp_col'] = df['id'].shift()
df['new_col'] = np.where((df['id'] == 'D') & (df['temp_col'] == 'C'), df['text'].shift(), "")
del df['temp_col']
print(df)

We can also do it without a temporary column. (Thanks& credits to Prayson 🙂)

df['new_col'] = np.where((df['id'].eq('D')) & (df['id'].shift().eq('C')), df['text'].shift(), "")

edited Aug 13, 2021 at 5:05

answered Aug 12, 2021 at 5:55

Vikas Periyadath

3,1961 gold badge25 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Prayson W. Daniel Over a year ago

This is cool! We don't need temporary column df['new_col'] = np.where((df['id'].eq('D')) & (df['id'].shift().eq('C')), df['text'].shift(), "")

Collectives™ on Stack Overflow

How to speed up nested for loop with dataframe?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related