2

I am trying to understand the speed of sorting an already sorted list of objects in Python. Python uses Timsort algorithm which is said to perform O(n) on an already sorted list and O(nlogn) on average/worst case. In the code snippet below I perform two tests on a list of 20 million random numbers.

First test compares time needed for sorting an unsorted list of floats vs time needed for sorting the same list but already sorted. The time gain is roughly 7.5x (7.46 seconds vs 1.02 seconds) which makes sense.

In the second test, instead of using raw floats, I create a list of objects where each object holds one float number, then I perform the same sorting test. In that case sorting takes 10.08 seconds, then I sort again the already sorted list of objects, but time elapsed goes up from 10.08 seconds to 11.50 seconds which is counterintuitive to me. I understand that there might be some overhead from retrieving the object's attribute, however I would not expect the sorting of the already sorted list to take longer than sorting the same unsorted list.

My question is, why does it take longer to sort an already sorted list of objects than sorting the same unsorted list of objects?

import random
import time


class C:
    def __init__(self, x):
        self.x = x


random_nbrs = [random.random() for _ in range(20_000_000)]

t0 = time.time()
w = sorted(random_nbrs)
print("1. Sorting list of random numbers took {:.2f} seconds".format(time.time() - t0))

t0 = time.time()
x = sorted(w)
print("2. Sorting sorted list of random numbers took {:.2f} seconds".format(time.time() - t0))

objects = [C(e) for e in random_nbrs]

t0 = time.time()
y = sorted(objects, key=lambda x: x.x)
print("3. Sorting list of random objects took {:.2f} seconds".format(time.time() - t0))

t0 = time.time()
z = sorted(y, key=lambda x: x.x)
print("4. Sorting sorted list of random objects took {:.2f} seconds".format(time.time() - t0))
  1. Sorting list of random numbers took 7.46 seconds
  2. Sorting sorted list of random numbers took 1.02 seconds
  3. Sorting list of random objects took 10.08 seconds
  4. Sorting sorted list of random objects took 11.50 seconds
3
  • How many times did you run your test? Commented Oct 6, 2022 at 13:56
  • @rdas between 10 and 20 times. Commented Oct 6, 2022 at 14:01
  • The difference reappears if you implement in C method def __lt__(self, other): return self.x < other.x and drop the lambda function when sorting. Unfortunately most of the difference is that sorting the unsorted list of objects now takes much longer. Commented Oct 6, 2022 at 15:45

1 Answer 1

0

I would say it is caused by cache misses as sorting a list that was created as sorted is even faster, also simple iteration

for i in objects:
    _

after sorting seems to be slower (I've tried only few runs). In the original case, we have only sorted list of pointers to objects containing floats, but those floats are probably still in memory in the original order. Take on mind that a python float is an object.

Complete code:

import random
import time

random_nbrs = [random.random() for _ in range(20_000_000)]

t0 = time.time()
for i in random_nbrs:
    _
print("0.5 Iterating list of random numbers took {:.2f} seconds".format(time.time() - t0))

t0 = time.time()
w = sorted(random_nbrs)
print("1. Sorting list of random numbers took {:.2f} seconds".format(time.time() - t0))


t0 = time.time()
for i in w:
    _
print("1.5 Iterating sorted list of random numbers took {:.2f} seconds".format(time.time() - t0))


t0 = time.time()
x = sorted(w)
print("2. Sorting sorted list of random numbers took {:.2f} seconds".format(time.time() - t0))

objects = [C(e) for e in random_nbrs]
t0 = time.time()
for i in objects:
    _
print("2.5 Iterating list of random objects took {:.2f} seconds".format(time.time() - t0))

t0 = time.time()
for i in objects:
    i.x
print("2.75 Iterating list of random objects accessing to members took {:.2f} seconds".format(time.time() - t0))

t0 = time.time()
y = sorted(objects, key=lambda x: x.x)
print("3. Sorting list of random objects took {:.2f} seconds".format(time.time() - t0))

t0 = time.time()
for i in y:
    _
print("3.5 Iterating sorted list of random objects took {:.2f} seconds".format(time.time() - t0))

t0 = time.time()
for i in y:
    i.x
print("3.75 Iterating sorted list of random objects accessing to members took {:.2f} seconds".format(time.time() - t0))

t0 = time.time()
z = sorted(y, key=lambda x: x.x)
print("4. Sorting sorted list of random objects took {:.2f} seconds".format(time.time() - t0))

numb_from_objects = [x.x for x in y]
t0 = time.time()
n = sorted(numb_from_objects)
print("5. Sorting sorted list of random numbers from objects took {:.2f} seconds".format(time.time() - t0))

sorted_nbrs = [i for i in range(20_000_000)]
t0 = time.time()
n = sorted(sorted_nbrs)
print("6. Sorting sorted list of sorted numbers took {:.2f} seconds".format(time.time() - t0))

Results from one run:

0.5 Iterating list of random numbers took 0.86 seconds
1. Sorting list of random numbers took 7.66 seconds
1.5 Iterating sorted list of random numbers took 2.58 seconds
2. Sorting sorted list of random numbers took 1.24 seconds
2.5 Iterating list of random objects took 0.89 seconds
2.75 Iterating list of random objects accessing to members took 1.17 seconds
3. Sorting list of random objects took 9.52 seconds
3.5 Iterating sorted list of random objects took 3.10 seconds
3.75 Iterating sorted list of random objects accessing to members took 5.77 seconds
4. Sorting sorted list of random objects took 12.65 seconds
5. Sorting sorted list of random numbers from objects took 0.97 seconds
6. Sorting sorted list of sorted numbers took 0.51 seconds
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.