I have a dataframe like this:
test = pd.DataFrame({'id':['a','C','D','b','b','D','c','c','c'], 'text':['a','x','a','b','b','b','c','c','c']})
Using the following for-loop I can add x to a new_col. This for-loop works fine for the small dataframe. However, for dataframes that have thousands of rows, it will take many hours to process. Any suggestions to speed it up?
for index, row in test.iterrows():
if row['id'] == 'C':
if test['id'][index+1] =='D':
test['new_col'][index+1] = test['text'][index]

98%of the timesfor-loopis bad and not to be used withPandas. Avoid it at all costs! ;)