2

As far as I know, set in python works via a hash-table to achieve O(1) look-up complexity. While it is hash-table, every entry in a set must be hashable (or immutable). So This peace of code raises exception:

>>> {dict()}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'

Because dict is not hashable. But we can create our own class inherited from dict and implement the __hash__ magic method. I created my own in this way:

>>> class D(dict):
...     def __hash__(self):
...             return 3
...

I know it should not work properly but I just wanted to experiment with it. So I checked that I can now use this type in set:

>>> {D()}
{{}}
>>> {D(name='ali')}
{{'name': 'ali'}}

So far so good, but I thought that this way of implementing the __hash__ magic method would screw up the look up in set. Because every object of the D has the same hash value.

>>> d1 = D(n=1)
>>> d2 = D(n=2)
>>> hash(d1), hash(d2)
(3, 3)
>>> 
>>> {d1, d2}
{{'n': 2}, {'n': 1}}

But the surprise for me was this:

>>> d3 = D()
>>> d3 in {d1, d2}
False

I expected the result to be True, because hash of d3 is 3 and currently there are values in our set with the same hash value. How does the set works internally?

5
  • 1
    They hash to the same bucket, which will make your hash-lookups all collisions, but they aren't equal, i.e. __eq__, so they won't behave incorrectly. Of course, these are mutable objects, which will create other problems as well Commented May 19, 2022 at 18:12
  • The hash is only half the deal, the other halve is equality between the instances. And you have not implemented any equality. Commented May 19, 2022 at 18:12
  • Most algorithms that use hash table have to handle collisions — usually by storing a list of objects that have the same hash and doing a linear search checking for equality when one occurs. Commented May 19, 2022 at 18:12
  • 1
    "I expected the result to be True, because hash of d3 is 3 and currently there are values in our set with the same hash value. How does the set works internally?" Like a hash set / hash table. But you are making the mistake of thinking that a hash table decides whether something is or isn't in it by using only the hash value, it doesn't. Commented May 19, 2022 at 18:13
  • @luk2302 well, they inherit equality semantics from dict Commented May 19, 2022 at 18:13

1 Answer 1

4

To be usable in sets and dicts, a __hash__ method must guarantee that if x == y, then hash(x) == hash(y). But that's a one-sided implication. It's not at all required that if hash(x) == hash(h) then x == y must be true. Indeed, that's impossible to achieve in general (for example, there are an unbounded number of distinct Python ints, but only a finite number of hash codes - there must be distinct ints that have the same hash value).

That your hashes are all the same is fine. They only tell the set/dict where to start looking. All objects in the container with the same hash are then compared, one by one, for equality, until success, or until all such objects have been tried without success.

However, while making all hashes the same doesn't hurt correctness, it's a disaster for performance: it effectively turns the set/dict into an exceptionally slow way to do an O(n) linear search.

Sign up to request clarification or add additional context in comments.

2 Comments

It essentially turns it into a list where you do if new_element not in the_set: the_set.append(new_element)
@Barmar, it's worse than that. Same O(n) worst-case behavior, but at least a linear search through a list accesses memory in predictable order. The hash collision strategies for sets/dicts leap all over the container from one try to the next.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.