While looking for some opportunities to improve efficiency in a Python script, I stumbled upon a rather confusing (to me) discrepancy between the total time it takes for a loop to execute and the cumulative time it takes for the instructions within the loop to execute.
Here's the relevant block of code (it involves reading lines in a csv and then running some calculations on the elements of each line):
time_to_execute_lines = 0
start_reading = time.time()
for line in file:
s = time.time()
if not line[0] in foo:
continue
if not is_valid_row(line):
continue
if line[1] in my_dict[line[0]]:
update_item(line,bar)
else:
add_item(line,bar)
time_to_execute_lines = time_to_execute_lines + time.time() - s
stop_reading = time.time()
print "Time to complete for loop: " + str(stop_reading - start_reading)
print "Time to execute lines of loop: " + str(time_to_execute_lines)
Some example output, which I observe for several different files:
Time to complete for loop: 7.80099987984
Time to execute lines of loop: 0.420000076294
This is not just me adding time by running these calculations, either. If I remove the duration calculations within the loop, I get a little bit of time back, but not nearly enough to justify the discrepancy:
start_reading = time.time()
for line in file:
s = time.time()
if not line[0] in foo:
continue
if not is_valid_row(line):
continue
if line[1] in my_dict[line[0]]:
update_item(line,bar)
else:
add_item(line,bar)
time_to_execute_lines = time_to_execute_lines + time.time() - s
stop_reading = time.time()
print "Time to complete for loop: " + str(stop_reading - start_reading)
Output:
Time to complete for loop: 7.24400019646
Any thoughts on what would cause this discrepancy? Is there a systematic measurement error in this method of timing the instructions within the loop? I would love to get those seven seconds back!