Python for loop timing discrepancy

Question

While looking for some opportunities to improve efficiency in a Python script, I stumbled upon a rather confusing (to me) discrepancy between the total time it takes for a loop to execute and the cumulative time it takes for the instructions within the loop to execute.

Here's the relevant block of code (it involves reading lines in a csv and then running some calculations on the elements of each line):

        time_to_execute_lines = 0
        start_reading = time.time()
        for line in file:
            s = time.time()
            if not line[0] in foo:
                continue
            if not is_valid_row(line):
                continue        
            if line[1] in my_dict[line[0]]:
                update_item(line,bar)
            else:
                add_item(line,bar)
            time_to_execute_lines = time_to_execute_lines + time.time() - s
        stop_reading = time.time()
        print "Time to complete for loop: " + str(stop_reading - start_reading)
        print "Time to execute lines of loop: " + str(time_to_execute_lines)

Some example output, which I observe for several different files:

Time to complete for loop: 7.80099987984
Time to execute lines of loop: 0.420000076294

This is not just me adding time by running these calculations, either. If I remove the duration calculations within the loop, I get a little bit of time back, but not nearly enough to justify the discrepancy:

start_reading = time.time()
        for line in file:
            s = time.time()
            if not line[0] in foo:
                continue
            if not is_valid_row(line):
                continue        
            if line[1] in my_dict[line[0]]:
                update_item(line,bar)
            else:
                add_item(line,bar)
            time_to_execute_lines = time_to_execute_lines + time.time() - s
        stop_reading = time.time()
        print "Time to complete for loop: " + str(stop_reading - start_reading)

Output:

Time to complete for loop: 7.24400019646

Any thoughts on what would cause this discrepancy? Is there a systematic measurement error in this method of timing the instructions within the loop? I would love to get those seven seconds back!

Claudiu · Accepted Answer · 2016-01-26 03:15:54Z

2

The total time includes the time to read the lines in the file, but the individually-added time does not (i.e. for line in file:, and all that that does, is not timed in time_to_execute_lines).

answered Jan 26, 2016 at 3:15

Claudiu

231k174 gold badges507 silver badges702 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Akaisteph7 Over a year ago

I had a very similar situation as OP but I thought my for loop was just going over a plain iterable. What I ended up noticing thanks to this answer is that my code was retrieving data from a Mongo database on each for loop call and that was what was slowing it down.. Thanks!

Collectives™ on Stack Overflow

Python for loop timing discrepancy

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related