1

I have a list of objects and would like to sort them according to the return value of an instance function. There are two ways to do them

from operator import methodcaller

l.sort(key=lambda x:x.f())
l.sort(key=methodcaller('f'))

Is one way betther than the other? Or it's just a personal preference?

3 Answers 3

5

methodcaller('f') is faster, because it can do both the attribute lookup and the method call in C code.

The lambda adds the following extra overhead:

  • Calling the lambda has to step out of the sort() C loop back into Python code. This requires a new frame object with associated data.

  • Looking up the method attribute is a Python opcode with more overhead than the direct equivalent in C.

  • Calling the method from a Python frame next has to push that frame on the Python call stack again. C code has a stack too, but this is far lighter.

  • Returning from the called method goes back to the Python frame, popping that from the stack, and after which the lambda returns, causing the function frame to be destroyed again (which is more work still).

You can measure the difference:

>>> from timeit import timeit
>>> timeit('m("")', 'm = lambda s: s.lower()', number=10**7)
1.2575681940070353
>>> timeit('m("")', 'from operator import methodcaller; m = methodcaller("lower")', number=10**7)
1.061251598992385

So on 7 million calls to str.lower() on an empty string, a methodcaller() is about 16% faster.

Now, if all your data is of the exact same type, where object.f would always bind to the same method, then you can just use the unbound method:

l.sort(key=SharedType.f)

That saves you having to look it up on each of the instances.

Sign up to request clarification or add additional context in comments.

Comments

3

I think the best way, if all elements of l are garunteed to be of the same type , is for

class X:
    def __init__(self):
        ...
    def f(self):
        ...

you can do

l.sort(key=X.f)

1 Comment

This only works if all the objects in the list are the same type. If they are, say, subclasses of a base class, with each subclass having overridden f, then this certainly won't work.
1

They are completly equivalent but methodcaller might be a bit faster:

class Fun(object):
    def __init__(self, value):
        self.value = value

    def f(self):
        return self.value

import random
from operator import methodcaller

l = [Fun(random.random()) for _ in range(10000)]

assert sorted(l, key=lambda x:x.f()) == sorted(l, key=methodcaller('f'))

%timeit sorted(l, key=lambda x:x.f())     # 100 loops, best of 3: 8.4 ms per loop
%timeit sorted(l, key=methodcaller('f'))  # 100 loops, best of 3: 7.5 ms per loop

As pointed out by @PatrickHaugh you might also just use class.f as key function which is even faster but as @MartijnPieters said this only works if all objects are of the type class:

%timeit sorted(l, key=Fun.f)              # 100 loops, best of 3: 6.1 ms per loop

14 Comments

A methodcaller is always faster than a lambda for this case.
@MartijnPieters In almost all cases methodcaller is faster. But creating a lambda is faster than creating a methodcaller (at least on my computer) and for very short lists 2-5 items the lambda is faster/comparable.
It would, at most, be too close to call, because with 2-5 items other 'noise' would drown out the difference.
okay but "too close to call" is just admitting that it's not "always faster", right?
No, it is faster, but you can't measure this because on a multi-tasking machine you can't prevent the OS from doing some disk I/O or scheduling in between the instructions; those small 'distractions' make it impossible to get an accurate reading on so little data.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.