0

When hash() method is called in Python 3, I noticed that it doesn't return a long-length integer when taking in int data type but with string type.

Is this supposed to work this way? If that actually is the case, for the int type to have a short hash value, won't it cause collision since it's too short?

for i in [i for i in range(5)]:
    print(hash(i))

print(hash("abc"))

The Result:

0
1
2
3
4
4714025963994714141
5
  • 4
    What hash does is implementation-dependent; don't make any assumptions about what it returns. Commented Nov 7, 2018 at 17:00
  • 2
    Collisions are inevitable; larger tables reduce collisions, but waste more space. Commented Nov 7, 2018 at 17:03
  • 3
    just to clarify: hash is not a cryptographic hash. if you are interested in those use hashlib. the built-in hash is just about unique identifiers. Commented Nov 7, 2018 at 17:03
  • 2
    The purpose of this value is to distribute keys into dictionary buckets -- it's not intended to be used for purposes that require longer output or stronger guarantees; given its primary use case, the main design goal is speed (since every lookup requires calculating the hash for the key). Commented Nov 7, 2018 at 17:04
  • 1
    BTW, code formatting should be used, for, well, code. a long-length integer isn't code, it's English prose; likewise for short hash value. If you want to emphasize prose, italics are usually the right choice. See Highlighting technical words? on Meta Stack Exchange. Commented Nov 7, 2018 at 17:10

3 Answers 3

9

In CPython, default Python interpreter implementation, built-in hash is done in this way:

For numeric types, the hash of a number x is based on the reduction of x modulo the prime P = 2**_PyHASH_BITS - 1. It's designed so that hash(x) == hash(y) whenever x and y are numerically equal, even if x and y have different types

_PyHASH_BITS is 61 (64-bit systems) or 31 (32-bit systems)(defined here)

So on 64-bit system built-in hash looks like this function:

def hash(number):
    return number % (2 ** 61 - 1)

That's why for small ints you got the same values, while for example hash(2305843009213693950) returns 2305843009213693950 and hash(2305843009213693951) returns 0

Sign up to request clarification or add additional context in comments.

Comments

4

The only purpose of the hash function is to produce an integer value that can be used to insert an object into a dict. The only thing hash guarantees is that if a == b, then hash(a) == hash(b). For a user-defined class Foo, it is the user's responsibility to ensure that Foo.__eq__ and Foo.__hash__ enforce this guarantee.

Anything else is implementation-dependent, and you shouldn't read anything into the value of hash(x) for any value x. Specifically, hash(a) == hash(b) is allowed for a != b, and hash(x) == x is not required for any particular x.

Comments

0

You should use hashlib module:

>>> import hashlib()
>>> m.update(b'abc')
>>> m.hexdigest()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.