0

I have a dataframe like this:

test = pd.DataFrame({'id':['a','C','D','b','b','D','c','c','c'], 'text':['a','x','a','b','b','b','c','c','c']})

Using the following for-loop I can add x to a new_col. This for-loop works fine for the small dataframe. However, for dataframes that have thousands of rows, it will take many hours to process. Any suggestions to speed it up?

for index, row in test.iterrows():
    if row['id'] == 'C':
        if test['id'][index+1] =='D':
            test['new_col'][index+1] = test['text'][index]

3
  • Please add result as text instead of image Commented Aug 12, 2021 at 5:26
  • It is always a bad idea to iterate through a pandas DataFrame. Their documentation even warns of this. You would be happier if you pre-processed the data before converting it to a DataFrame. Commented Aug 12, 2021 at 5:26
  • 98% of the times for-loop is bad and not to be used with Pandas. Avoid it at all costs! ;) Commented Aug 12, 2021 at 17:33

1 Answer 1

2

Try using shift() and conditions.

import pandas as pd
import numpy as np

df = pd.DataFrame({'id': ['a', 'C', 'D', 'b', 'b', 'D', 'c', 'c', 'c'], 
                   'text': ['a', 'x', 'a', 'b', 'b', 'b', 'c', 'c', 'c']})


df['temp_col'] = df['id'].shift()
df['new_col'] = np.where((df['id'] == 'D') & (df['temp_col'] == 'C'), df['text'].shift(), "")
del df['temp_col']
print(df)

We can also do it without a temporary column. (Thanks& credits to Prayson 🙂)

df['new_col'] = np.where((df['id'].eq('D')) & (df['id'].shift().eq('C')), df['text'].shift(), "")
Sign up to request clarification or add additional context in comments.

1 Comment

This is cool! We don't need temporary column df['new_col'] = np.where((df['id'].eq('D')) & (df['id'].shift().eq('C')), df['text'].shift(), "")

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.