1

I wrote this to calculate the average running time of sorting algorithms, and I was just curious if there was a way to refactor this to something simpler or better.

time = []
for i in range(3):
    start = timeit.default_timer()
    insert_list = []
    for i in range(3000):
        insert_list.append(randint(0,5000))
    sorted_list = merge_sort(insert_list)
    stop = timeit.default_timer()
    time.append(stop - start)
print sum(time) /len(time)
1
  • You probably want to use xrange() (generates numbers on the fly) instead of range() (generates an array of numbers). The timing should only check the time for merge_sort(), and not include the time to create the list. Commented Oct 6, 2017 at 22:45

2 Answers 2

1

Try using datetime to measure the running time of your algorithm.

datetime.datetime has a microsecond attribute that can be used if you choose to use datetime.datetime.now()

   from datetime import datetime
    startTime = datetime.now()
    #CODE
    print("Time taken:",datetime.now() - startTime)
Sign up to request clarification or add additional context in comments.

Comments

1

First, you have to move for i in range(3000) cycle outside of the time measurements. This is NOT sorting, so you actually measure the dataset population. And since you use the random numbers, it will be highly dependent on the speed of the source of entropy (e.g. /dev/random, /dev/urandom, or alike), which can be very slow in some configurations (e.g., VMs on a shared host or in the cloud). Which has nothing to do with the speed of the sorting algorithm.

time = []
for i in range(3):
    insert_list = []
    for i in range(3000):
        insert_list.append(randint(0,5000))
    start = timeit.default_timer()
    sorted_list = merge_sort(insert_list)
    stop = timeit.default_timer()
    time.append(stop - start)
print sum(time) /len(time)

Second, not so important, this timer (so as time.time() & datetime.now()) can give unexpected results in case of timezone shifts, daylight savings, ntp time adjustments, etc. It is better to use monotonic.monotonic(), which uses the OS'es source of monotonic time if possible. Though, this is an external library, not a builtin.

time = []
for i in range(3):
    insert_list = []
    for i in range(3000):
        insert_list.append(randint(0,5000))
    start = monotonic.monotonic()
    sorted_list = merge_sort(insert_list)
    stop = monotonic.monotonic()
    time.append(stop - start)
print sum(time) /len(time)

Third, the measurements can be affected by the external circumstances if you measure each call separately. Such as too fast algorithm on a too small dataset, which will lead to measurement roundings due to precision of the time clock. Instead, make N sorting calls and measure the time of the whole cycle. Then divide the total time by the number of operations. This goes at the cost of memory, since you have to prepare all N arrays in advance.

N = 3
dataset = []
for i in range(N):
    insert_list = []
    for i in range(3000):
        insert_list.append(randint(0,5000))
    dataset.append(insert_list)
start = monotonic.monotonic()
for insert_list in dataset:
    sorted_list = merge_sort(insert_list)
stop = monotonic.monotonic()
print (stop - start) / N

Fourth, why not use timeit.timeit() function?

N = 3
dataset = [[randint(0, 5000) for j in range(3000)] for i in range(N)]
print(timeit.timeit(lambda: merge_sort(dataset.pop()), number=N))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.