62

I recently tried the following commands in Python:

>>> {lambda x: 1: 'a'}
{<function __main__.<lambda>>: 'a'}

>>> def p(x): return 1
>>> {p: 'a'}
{<function __main__.p>: 'a'}

The success of both dict creations indicates that both lambda and regular functions are hashable. (Something like {[]: 'a'} fails with TypeError: unhashable type: 'list').

The hash is apparently not necessarily the ID of the function:

>>> m = lambda x: 1
>>> id(m)
140643045241584
>>> hash(m)
8790190327599
>>> m.__hash__()
8790190327599

The last command shows that the __hash__ method is explicitly defined for lambdas, i.e., this is not some automagical thing Python computes based on the type.

What is the motivation behind making functions hashable? For a bonus, what is the hash of a function?

4
  • 7
    I really feel like this is the sort of question that you shouldn't consider before you have a good answer to "why wouldn't you?" Commented Jul 22, 2016 at 11:24
  • 3
    @Hurkyl. Because it requires the addition of extra maintenance burden for one thing. Someone had to design and write the __hash__ function, so they clearly saw benefit in it. I would like to know what made them not just leave it alone. Commented Jul 22, 2016 at 13:21
  • Although given the answers, it seems that disabling the hash function would require more work than just inheriting it from object. So in fact one of the considerations may have been reduction of maintenance debt. Commented Jul 22, 2016 at 16:16
  • 2
    @MadPhysicist, the code is trivial either way. In C, an object's type is represented by a struct PyTypeObject, and that struct's tp_hash member defines what happens for __hash__(). If that member is initialized to 0, it inherits __hash__(). If you want the type to refuse to hash, you just initialize it to PyObject_HashNotImplemented instead. (And if the type wants to implement a custom hash, you plug in the address of the type's custom hash function.) So maintenance burden really has nothing to do with it. Commented Jul 22, 2016 at 17:41

3 Answers 3

63

It's nothing special. As you can see if you examine the unbound __hash__ method of the function type:

>>> def f(): pass
...
>>> type(f).__hash__
<slot wrapper '__hash__' of 'object' objects>

the of 'object' objects part means it just inherits the default identity-based __hash__ from object. Function == and hash work by identity. The difference between id and hash is normal for any type that inherits object.__hash__:

>>> x = object()
>>> id(x)
40145072L
>>> hash(x)
2509067

You might think __hash__ is only supposed to be defined for immutable objects, and you'd be almost right, but that's missing a key detail. __hash__ should only be defined for objects where everything involved in == comparisons is immutable. For objects whose == is based on identity, it's completely standard to base hash on identity as well, since even if the objects are mutable, they can't possibly be mutable in a way that would change their identity. Files, modules, and other mutable objects with identity-based == all behave this way.

Sign up to request clarification or add additional context in comments.

7 Comments

So by default, all instances in Python are hashable unless it is turned off somewhere in the class hierarchy? That is interesting news indeed, although intuitively it makes sense based on how few types you cant't use to key a dict.
== does not work on identity. is is the identity operator.
@Delioth: == works by identity for any type that doesn't specify some other behavior.
@MadPhysicist The proper way to tell python that a class is not hashable is by setting __hash__ to None: class X: __hash__ = None;{X(): 1} -> TypeError: unhashable type: 'X'
@Bakuriu. Normally I would agree with you. Have you tried doing this for a function? See the comments below stackoverflow.com/a/38518991/2988730.
|
26

It can be useful, e.g., to create sets of function objects, or to index a dict by functions. Immutable objects normally support __hash__. In any case, there's no internal difference between a function defined by a def or by a lambda - that's purely syntactic.

The algorithm used depends on the version of Python. It looks like you're using a recent version of Python on a 64-bit box. In that case, the hash of a function object is the right rotation of its id() by 4 bits, with the result viewed as a signed 64-bit integer. The right shift is done because object addresses (id() results) are typically aligned so that their last 3 or 4 bits are always 0, and that's a mildly annoying property for a hash function.

In your specific example,

>>> i = 140643045241584 # your id() result
>>> (i >> 4) | ((i << 60) & 0xffffffffffffffff) # rotate right 4 bits
8790190327599  # == your hash() result

3 Comments

@JulienBernu, see @user2357112's response - it's immutable so far as == is concerned, which is all __hash__() cares about.
One of the biggest (only?) mistakes of the __hash__ spec is "The only required property is that objects which compare equal have the same hash value" -- this is a concept borne from inexperience which will plague python until the end of time. The hash of an object for the purpose of identity should never depend on object equality and should not mutate over time. developers would be wise to not implement their hash functions in a way that an object's state determines hash result. Python has made a lot of great and tough decisions over the years, this was not one of them. This was a mistake.
@wilson0x4d, sorry, I have no idea what you're talking about. Open a new issue if you want to pursue it - comments are not intended to be discussion forums on this site. Especially not when you add a comment after a lag of over 8 year ;-)
4

A function is hashable because it is a normal, builtin, non mutable object.

From the Python Manual:

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

All of Python’s immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is derived from their id().

8 Comments

They're actually a lot more mutable than you might expect. You can add attributes to them, reassign their __name__, __module__, and __doc__, and even reassign their __code__ if you're feeling crazy.
You can also reassign their __hash__ function :)
@MadPhysicist: Though reassigning f.__hash__ won't actually do anything to hash(f).
@user2357112 Interesting. I just confirmed. Why is that?
@MadPhysicist You just can't. The Python interpreter does support true readonly attributes, but these facilities can only be accessed via the C API. So from pure python code it's impossible to change those values (thank god!).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.