Python - Avoiding Nested 'for' Loops To Make Code More Efficient

Question

I have two datasets:

First dataset represents a table with locomotive information.
Second dataset represents a table containing exchanged messages between locomotives and control center.

I want to get the number of messages per locomotive that had a delivery time more than or equal to 300 seconds.

In order to calculate this, I wrote the following code (dataframe 1 is 'df_loco', dataframe 2 is 'df_sent'):

    for i in range(0, len(df_loco['LOCO']),1):
      for j in range(0,len(df_sent['HM_ID_HM']),1):
        if (df_sent.iloc[j,1] == df_loco.iloc[i,0]) and (df_sent.iloc[j,11]>=a):
            df_loco.iloc[i,4] += 1

The code does the job and gives me the correct number of messages that suffered delays. However, my dataset is big (dataframe 1 is 300+ rows, dataframe 2 is 55.000+) so it takes a long time to execute it.

I have been able to get rid of nested for loops in parts of my code which made it 100x more efficient. But here I'm having issues implementing the 'and' operator while not using 'for' loops. It returns me an error stating that the code is too ambiguous(?).

Question: Is there a way to improve this code, not having to work with nested loops?

You should be able to find some ideas at: stackoverflow.com/questions/44462325/… — Jeffrey Swan
– Jeffrey Swan, Commented Feb 16, 2021 at 22:50
As a first attack, look at itertools package. You will want the product of the two ranges. More than that, learn to use the full-column operations. — Prune
– Prune, Commented Feb 16, 2021 at 22:50
Does this answer your question? avoiding nested for loops python — Jeffrey Swan
– Jeffrey Swan, Commented Feb 16, 2021 at 22:51

norie · Accepted Answer · 2021-02-16 23:28:32Z

1

Using pandas you could merge the 2 dataframes with a left join, filter for DELIVERY_TIME > 300 and then use groupby to get a count for LOCO_ID.

import pandas as pd

df_loco = pd.DataFrame({'LOCO_ID':[123, 456, 789, 321]})
df_sent = pd.DataFrame({'MESSAGE_ID':range(1, 7),'LOCO_ID':[456, 123, 123, 321, 789, 123], 'DELIVERY_TIME':[14,800, 420,310,60, 14]})

merged_dataframe = df_loco.merge(df_sent, how='left')

count_dataframe = merged_dataframe[merged_dataframe['DELIVERY_TIME']>300].groupby('LOCO_ID')['LOCO_ID'].count()

print(count_dataframe)

answered Feb 16, 2021 at 23:28

norie

9,9372 gold badges14 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Kenneth De Coster Over a year ago

This is exactly what I was looking for. I've applied it on my two dataframes and it worked! Just one question, how can I have it added to the column 'COUNT_DELAYS' instead of 'count_dataframe'?

Collectives™ on Stack Overflow

Python - Avoiding Nested 'for' Loops To Make Code More Efficient

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related