The built-in function hash in Python is used mainly for hashing keys in collections like dict and set. The only required property, more or less, is that objects which compare equal must have the same hash value.
In the default CPython implementation of Python, for objects of type str, bytes or datetime, a hard-to-invert hash function is used (SipHash) and moreover hash randomization is used since Python 3.3 to prevent hash-flooding attacks. This means that the hash changes across invocations of Python. So if you're trying to invert hash() to recover a string with that hash, forget about it.
But for numeric values (including int, float, Decimal, Fraction), a simple and elegant hashing function is used since 2010, which has the property that for equal numeric values their hashes are equal even if they are of different types (like 42 and 42.0 and Decimal('42') and Fraction(42, 1) and complex(42, 0)). The function used is as follows (ignore negative numbers for a moment):
For an integer n, the hash is given simply by hash(n) = n mod P, where P is sys.hash_info.modulus = 261 - 1, a large prime.
This is generalized to all finite rational numbers as follows: for a rational number n = p/q, the hash is hash(p/q) = (p/q) mod P, with the division being interpreted modulo P. In other words, if p/q is in lowest form (or at least, common factors of P removed), then hash(p/q) is got by computing the inverse of q mod P, multiplying that with p, and taking the result modulo P.
For negative numbers, hash(-n) = -hash(n).
(There are a couple more details on special values: if n is floating-point infinity, or if q has no inverse, i.e. is a multiple of P, then sys.hash_info.inf is used, and if n is a NaN then sys.hash_info.nan is used. Also the hash value is never -1.)
This makes it a nice exercise to invert this hash function. For a given nonnegative value h, where 0 ≤ h < P,
An integer n has hash(n) = h if and only if n mod P is h, i.e. if n is of the form (h + kP) for some integer k.
A (IEEE 754 double-precision) floating-point number with 52 mantissa bits m and 11 exponent bits e represents the rational number
(252 + m)/252 × 2e-1023
so if its hash is h, then we have the congruence conditions:
(252 + m) (2e-1023-52) ≡ h mod P
(252 + m) ≡ ((21023+52-e) × h) mod P
m ≡ (21023+52-e × h - 252) mod P
with the only constraint on m being that 0 ≤ m < 252. So for each of the 211 = 2048 possible values of e, we can compute the corresponding m and verify whether it leads to a valid floating-point number.
So here is a (Python 3) function that for a given h yields all int and float values n such that hash(n) is h.
import sys
def invert(h):
if h == -1: return [] # -1 gets coerced to -2 so no value has hash -1
if h < 0:
sign = -1
h = -h
else:
sign = 1
M = sys.float_info.mant_dig - 1 # = 52 = Bits available for mantissa
E = (sys.float_info.max_exp - sys.float_info.min_exp + 1) # = 1023 = bias
B = sys.float_info.radix # = 2, base of the floating point values
P = sys.hash_info.modulus # = 2^61 - 1 = the prime used as hash modulus
if not (0 <= h == int(h) < P):
return []
for e in range((E + 1) * 2):
# Want m such that (B^M + m) * B^(e-M-E) = h mod P
m = (h * B**(M+E-e) - B**M) % P
if m >= B**M: continue # Happens with probability (1-B**M/P)
f = (B**M + m) * B**(e-M-E)
if f == int(f): continue # We'll see this later as an integer
assert hash(f) == h
yield sign * f
# Special values if any
if h == sys.hash_info.inf:
yield sign * float('inf')
if h == sys.hash_info.nan:
yield float('nan')
# Now the integers
k = 0
while True:
yield sign * (h + k * P)
k += 1
Example usage:
num = 0
for n in invert(314159265):
print(hash(n), n)
num += 1
if num > 25: break
Output:
314159265 2.1332628416727795e-304
314159265 4.918969210286518e-286
314159265 1.1342370766076572e-267
314159265 2.6153726338867434e-249
314159265 6.030638704336553e-231
314159265 1.390570609748797e-212
314159265 3.2064375193072873e-194
314159265 7.393541538375207e-176
314159265 1.7048346069593532e-157
314159265 3.9310809603228e-139
314159265 9.064455551013383e-121
314159265 2.0901211464632472e-102
314159265 4.81949123398199e-84
314159265 1.111299016984405e-65
314159265 2.5624810694595406e-47
314159265 5.908679060255712e-29
314159265 1.3624486304777972e-10
314159265 314159265
314159265 2305843009527853216
314159265 4611686018741547167
314159265 6917529027955241118
314159265 9223372037168935069
314159265 11529215046382629020
314159265 13835058055596322971
314159265 16140901064810016922
314159265 18446744074023710873
etc.
hash()function in Python is definitely not.