First, you have to move for i in range(3000) cycle outside of the time measurements. This is NOT sorting, so you actually measure the dataset population. And since you use the random numbers, it will be highly dependent on the speed of the source of entropy (e.g. /dev/random, /dev/urandom, or alike), which can be very slow in some configurations (e.g., VMs on a shared host or in the cloud). Which has nothing to do with the speed of the sorting algorithm.
time = []
for i in range(3):
insert_list = []
for i in range(3000):
insert_list.append(randint(0,5000))
start = timeit.default_timer()
sorted_list = merge_sort(insert_list)
stop = timeit.default_timer()
time.append(stop - start)
print sum(time) /len(time)
Second, not so important, this timer (so as time.time() & datetime.now()) can give unexpected results in case of timezone shifts, daylight savings, ntp time adjustments, etc. It is better to use monotonic.monotonic(), which uses the OS'es source of monotonic time if possible. Though, this is an external library, not a builtin.
time = []
for i in range(3):
insert_list = []
for i in range(3000):
insert_list.append(randint(0,5000))
start = monotonic.monotonic()
sorted_list = merge_sort(insert_list)
stop = monotonic.monotonic()
time.append(stop - start)
print sum(time) /len(time)
Third, the measurements can be affected by the external circumstances if you measure each call separately. Such as too fast algorithm on a too small dataset, which will lead to measurement roundings due to precision of the time clock. Instead, make N sorting calls and measure the time of the whole cycle. Then divide the total time by the number of operations. This goes at the cost of memory, since you have to prepare all N arrays in advance.
N = 3
dataset = []
for i in range(N):
insert_list = []
for i in range(3000):
insert_list.append(randint(0,5000))
dataset.append(insert_list)
start = monotonic.monotonic()
for insert_list in dataset:
sorted_list = merge_sort(insert_list)
stop = monotonic.monotonic()
print (stop - start) / N
Fourth, why not use timeit.timeit() function?
N = 3
dataset = [[randint(0, 5000) for j in range(3000)] for i in range(N)]
print(timeit.timeit(lambda: merge_sort(dataset.pop()), number=N))