2

While looking for some opportunities to improve efficiency in a Python script, I stumbled upon a rather confusing (to me) discrepancy between the total time it takes for a loop to execute and the cumulative time it takes for the instructions within the loop to execute.

Here's the relevant block of code (it involves reading lines in a csv and then running some calculations on the elements of each line):

        time_to_execute_lines = 0
        start_reading = time.time()
        for line in file:
            s = time.time()
            if not line[0] in foo:
                continue
            if not is_valid_row(line):
                continue        
            if line[1] in my_dict[line[0]]:
                update_item(line,bar)
            else:
                add_item(line,bar)
            time_to_execute_lines = time_to_execute_lines + time.time() - s
        stop_reading = time.time()
        print "Time to complete for loop: " + str(stop_reading - start_reading)
        print "Time to execute lines of loop: " + str(time_to_execute_lines)

Some example output, which I observe for several different files:

Time to complete for loop: 7.80099987984
Time to execute lines of loop: 0.420000076294

This is not just me adding time by running these calculations, either. If I remove the duration calculations within the loop, I get a little bit of time back, but not nearly enough to justify the discrepancy:

start_reading = time.time()
        for line in file:
            s = time.time()
            if not line[0] in foo:
                continue
            if not is_valid_row(line):
                continue        
            if line[1] in my_dict[line[0]]:
                update_item(line,bar)
            else:
                add_item(line,bar)
            time_to_execute_lines = time_to_execute_lines + time.time() - s
        stop_reading = time.time()
        print "Time to complete for loop: " + str(stop_reading - start_reading)

Output:

Time to complete for loop: 7.24400019646

Any thoughts on what would cause this discrepancy? Is there a systematic measurement error in this method of timing the instructions within the loop? I would love to get those seven seconds back!

1 Answer 1

2

The total time includes the time to read the lines in the file, but the individually-added time does not (i.e. for line in file:, and all that that does, is not timed in time_to_execute_lines).

Sign up to request clarification or add additional context in comments.

1 Comment

I had a very similar situation as OP but I thought my for loop was just going over a plain iterable. What I ended up noticing thanks to this answer is that my code was retrieving data from a Mongo database on each for loop call and that was what was slowing it down.. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.