3

What does python 2.7 use to sort vanilla class instances? I'm interested in the default sorting behavior.

Suppose I have the class

class S():
    pass

Then I can create a couple of instances, and sort them:

a = S(); b = S(); c = S()
l = [(a,'a'), (b,'b') ,(c, 'c')]
sorted(l)

This will print some sorting of the objects. Now I have a two part question:

  • Is python using the objects' __hash__(), and thus their id()?
  • Is it possible to override __hash__() to influence the sorting behavior?
1
  • Is this Python 2 or 3? My impulse is "override lt, gt etc" but I have spotty understanding of Python 2's "classic" classes so don't know if that's appropriate here. You might also find this link helpful: wiki.python.org/moin/HowTo/Sorting Commented Dec 27, 2011 at 23:53

3 Answers 3

6

Python 3's built-in sorting makes use of the __lt__ method in your class.

The rich comparison methods are special in Python, since they can return a special NotImplemented type if there is not __lt__ defined - take a look at the docs on this page: http://docs.python.org/reference/datamodel.html#the-standard-type-hierarchy

Since the truth value of NotImplemented is True, any boolean comparison that gets NotImplemented will continue as if the first element actually is less than the second, which will cause the sort to leave the list in the same order as it was.

Take a look at the interactive shell. You can see how the truth values would be used in a sort, and that Python thinks that both objects are less than each other:

>>> class S():
...     pass
...
>>> a = S()
>>> b = S()
>>> a.__lt__( b )
NotImplemented
>>> if a.__lt__( b ):
...     print( "derp!" )
...
derp
>>> if b.__lt__(a):
...     print( "derp" )
...
derp

Here are some more references:

EDIT: After taking a look at Python 2.7, it looks like the ID of the objects is used for sorting, and the __lt__ method is undefined on a simple class like your example. Sorry for any confusion.

Sign up to request clarification or add additional context in comments.

3 Comments

But, the instances aren't left in their order! In my example I get [b, c, a].
Yeah, I realized I was missing the 2.7 angle (I've been using 3.2). After some similar investigation in the interactive shell, it looks like the ID of the classes are actually used. However, you can see how this behavior shouldn't be relied on, especially if you're going to migrate between versions of Python.
Relying on the default sort is what got me in trouble in the first place. Using (value, object) tuples to sort lists is very dangerous if python does a secondary sort on the object's id! I didn't know these kind of things had changed between python versions, I'm indeed using 2.x.
4

Python's sort algorithm exclusively compares items using a "less than" test, which can be implemented either using the __cmp__() special method (now deprecated) or __lt__() on the class.

In the absence of any specific instruction on how to compare two objects, the id() is used (not the hash) for objects of the same type, as in your case.

Comments

1

I tried your example on my own machine (python 2.7):

>>> sorted(l)
[(<__main__.S instance at 0x107142a70>, 'a'), (<__main__.S instance at 0x107142ab8>, 'b'), (<__main__.S instance at 0x107142b00>, 'c')]

Note that the sorted list is in order of the id():

>>> id(a)
4413729392
>>> id(b)
4413729464
>>> id(c)
4413729536

If I generate the hashes I get:

>>> hash(a)
275858087
>>> hash(b)
-9223372036578917717
>>> hash(c)
275858096

Which suggests that the sorting is not based on the hash.

See derekerdmann's answer for information on how to make python sort your class the way you want.

Edit: By the way, if you put the items in the list in reverse and then sort it, you get:

>>> l = [(c,'c'), (b, 'b'), (a, 'a')]
>>> sorted(l)
[(<__main__.S instance at 0x107142a70>, 'a'), (<__main__.S instance at 0x107142ab8>, 'b'), (<__main__.S instance at 0x107142b00>, 'c')]

Once again, its sorted in order of the id().

1 Comment

So they are not left in order as derek suggests. And actually, it says here that __hash__ returns id() by default, how do you explain the difference?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.