refactoring to calculate running time of sorting algorithm - python

Question

I wrote this to calculate the average running time of sorting algorithms, and I was just curious if there was a way to refactor this to something simpler or better.

time = []
for i in range(3):
    start = timeit.default_timer()
    insert_list = []
    for i in range(3000):
        insert_list.append(randint(0,5000))
    sorted_list = merge_sort(insert_list)
    stop = timeit.default_timer()
    time.append(stop - start)
print sum(time) /len(time)

You probably want to use xrange() (generates numbers on the fly) instead of range() (generates an array of numbers). The timing should only check the time for merge_sort(), and not include the time to create the list. — rcgldr
– rcgldr, Commented Oct 6, 2017 at 22:45

score 1 · Accepted Answer · 2017-10-06 22:44:02Z

1

Try using datetime to measure the running time of your algorithm.

datetime.datetime has a microsecond attribute that can be used if you choose to use datetime.datetime.now()

   from datetime import datetime
    startTime = datetime.now()
    #CODE
    print("Time taken:",datetime.now() - startTime)

edited Oct 6, 2017 at 22:44

answered Oct 6, 2017 at 22:36

user7214612

Sign up to request clarification or add additional context in comments.

Comments

Sergey Vasilyev · Accepted Answer · 2017-10-06 22:52:13Z

First, you have to move for i in range(3000) cycle outside of the time measurements. This is NOT sorting, so you actually measure the dataset population. And since you use the random numbers, it will be highly dependent on the speed of the source of entropy (e.g. /dev/random, /dev/urandom, or alike), which can be very slow in some configurations (e.g., VMs on a shared host or in the cloud). Which has nothing to do with the speed of the sorting algorithm.

time = []
for i in range(3):
    insert_list = []
    for i in range(3000):
        insert_list.append(randint(0,5000))
    start = timeit.default_timer()
    sorted_list = merge_sort(insert_list)
    stop = timeit.default_timer()
    time.append(stop - start)
print sum(time) /len(time)

Second, not so important, this timer (so as time.time() & datetime.now()) can give unexpected results in case of timezone shifts, daylight savings, ntp time adjustments, etc. It is better to use monotonic.monotonic(), which uses the OS'es source of monotonic time if possible. Though, this is an external library, not a builtin.

time = []
for i in range(3):
    insert_list = []
    for i in range(3000):
        insert_list.append(randint(0,5000))
    start = monotonic.monotonic()
    sorted_list = merge_sort(insert_list)
    stop = monotonic.monotonic()
    time.append(stop - start)
print sum(time) /len(time)

Third, the measurements can be affected by the external circumstances if you measure each call separately. Such as too fast algorithm on a too small dataset, which will lead to measurement roundings due to precision of the time clock. Instead, make N sorting calls and measure the time of the whole cycle. Then divide the total time by the number of operations. This goes at the cost of memory, since you have to prepare all N arrays in advance.

N = 3
dataset = []
for i in range(N):
    insert_list = []
    for i in range(3000):
        insert_list.append(randint(0,5000))
    dataset.append(insert_list)
start = monotonic.monotonic()
for insert_list in dataset:
    sorted_list = merge_sort(insert_list)
stop = monotonic.monotonic()
print (stop - start) / N

Fourth, why not use timeit.timeit() function?

N = 3
dataset = [[randint(0, 5000) for j in range(3000)] for i in range(N)]
print(timeit.timeit(lambda: merge_sort(dataset.pop()), number=N))

Collectives™ on Stack Overflow

refactoring to calculate running time of sorting algorithm - python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related