python list sort key function choice

Question

I have a list of objects and would like to sort them according to the return value of an instance function. There are two ways to do them

from operator import methodcaller

l.sort(key=lambda x:x.f())
l.sort(key=methodcaller('f'))

Is one way betther than the other? Or it's just a personal preference?

Martijn Pieters · Accepted Answer · 2016-12-08 19:40:12Z

methodcaller('f') is faster, because it can do both the attribute lookup and the method call in C code.

The lambda adds the following extra overhead:

Calling the lambda has to step out of the sort() C loop back into Python code. This requires a new frame object with associated data.
Looking up the method attribute is a Python opcode with more overhead than the direct equivalent in C.
Calling the method from a Python frame next has to push that frame on the Python call stack again. C code has a stack too, but this is far lighter.
Returning from the called method goes back to the Python frame, popping that from the stack, and after which the lambda returns, causing the function frame to be destroyed again (which is more work still).

You can measure the difference:

>>> from timeit import timeit
>>> timeit('m("")', 'm = lambda s: s.lower()', number=10**7)
1.2575681940070353
>>> timeit('m("")', 'from operator import methodcaller; m = methodcaller("lower")', number=10**7)
1.061251598992385

So on 7 million calls to str.lower() on an empty string, a methodcaller() is about 16% faster.

Now, if all your data is of the exact same type, where object.f would always bind to the same method, then you can just use the unbound method:

l.sort(key=SharedType.f)

That saves you having to look it up on each of the instances.

Patrick Haugh · Accepted Answer · 2016-12-08 19:29:36Z

3

I think the best way, if all elements of l are garunteed to be of the same type , is for

class X:
    def __init__(self):
        ...
    def f(self):
        ...

you can do

l.sort(key=X.f)

edited Dec 8, 2016 at 19:29

answered Dec 8, 2016 at 19:27

Patrick Haugh

61.3k13 gold badges94 silver badges101 bronze badges

1 Comment

Martijn Pieters Over a year ago

This only works if all the objects in the list are the same type. If they are, say, subclasses of a base class, with each subclass having overridden f, then this certainly won't work.

MSeifert · Accepted Answer · 2016-12-08 19:33:53Z

1

They are completly equivalent but methodcaller might be a bit faster:

class Fun(object):
    def __init__(self, value):
        self.value = value

    def f(self):
        return self.value

import random
from operator import methodcaller

l = [Fun(random.random()) for _ in range(10000)]

assert sorted(l, key=lambda x:x.f()) == sorted(l, key=methodcaller('f'))

%timeit sorted(l, key=lambda x:x.f())     # 100 loops, best of 3: 8.4 ms per loop
%timeit sorted(l, key=methodcaller('f'))  # 100 loops, best of 3: 7.5 ms per loop

As pointed out by @PatrickHaugh you might also just use class.f as key function which is even faster but as @MartijnPieters said this only works if all objects are of the type class:

%timeit sorted(l, key=Fun.f)              # 100 loops, best of 3: 6.1 ms per loop

edited Dec 8, 2016 at 19:33

answered Dec 8, 2016 at 19:28

MSeifert

154k41 gold badges356 silver badges377 bronze badges

14 Comments

Martijn Pieters Over a year ago

A methodcaller is always faster than a lambda for this case.

MSeifert Over a year ago

@MartijnPieters In almost all cases methodcaller is faster. But creating a lambda is faster than creating a methodcaller (at least on my computer) and for very short lists 2-5 items the lambda is faster/comparable.

Martijn Pieters Over a year ago

It would, at most, be too close to call, because with 2-5 items other 'noise' would drown out the difference.

MSeifert Over a year ago

okay but "too close to call" is just admitting that it's not "always faster", right?

Martijn Pieters Over a year ago

No, it is faster, but you can't measure this because on a multi-tasking machine you can't prevent the OS from doing some disk I/O or scheduling in between the instructions; those small 'distractions' make it impossible to get an accurate reading on so little data.

|

Collectives™ on Stack Overflow

python list sort key function choice

3 Answers 3

Comments

1 Comment

14 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

14 Comments

Your Answer

Sign up or log in

Post as a guest

Related