33

I'm using hashing of strings for seeding random states in the following way:

context = "string"
seed = hash(context) % 4294967295 # This is necessary to keep the hash within allowed seed values
np.random.seed(seed)

This is unfortunately (for my usage) non-deterministic between runs in Python 3.3 and up. I do know that I could set the PYTHONHASHSEED environment variable to an integer value to regain the determinism, but I would probably prefer something that feels a bit less hacky, and won't entirely disregard the extra security added by random hashing. Suggestions?

3
  • 1
    What is the purpose though? Why not to write simply seed = 42, unless you actually want the seed to be different on different runs? Commented May 1, 2020 at 11:23
  • 3
    @Alexey presumably because they actually do want the seed to be different when the context is different, but the same when the context is the same. Here, even if the context is the same, the seed will still be different. Commented Sep 6, 2021 at 18:19
  • Related: stackoverflow.com/questions/64344515/… Commented Sep 19, 2022 at 9:33

4 Answers 4

15

Use a purpose-built hash function. zlib.adler32() is an excellent choice; alternatively, check out the hashlib module for more options.

Sign up to request clarification or add additional context in comments.

1 Comment

Watch out! I found out the hard way, but adler32's purpose is not for hashing, but for error correction. It has a rather high collision probability. Quite a headache to debug.
10

You can actually use a string as seed for random.Random:

>>> import random
>>> r = random.Random('string'); [r.randrange(10) for _ in range(20)]
[0, 6, 3, 6, 4, 4, 6, 9, 9, 9, 9, 9, 5, 7, 5, 3, 0, 4, 8, 1]
>>> r = random.Random('string'); [r.randrange(10) for _ in range(20)]
[0, 6, 3, 6, 4, 4, 6, 9, 9, 9, 9, 9, 5, 7, 5, 3, 0, 4, 8, 1]
>>> r = random.Random('string'); [r.randrange(10) for _ in range(20)]
[0, 6, 3, 6, 4, 4, 6, 9, 9, 9, 9, 9, 5, 7, 5, 3, 0, 4, 8, 1]
>>> r = random.Random('another_string'); [r.randrange(10) for _ in range(20)]
[8, 7, 1, 8, 3, 8, 6, 1, 6, 5, 5, 3, 3, 6, 6, 3, 8, 5, 8, 4]
>>> r = random.Random('another_string'); [r.randrange(10) for _ in range(20)]
[8, 7, 1, 8, 3, 8, 6, 1, 6, 5, 5, 3, 3, 6, 6, 3, 8, 5, 8, 4]
>>> r = random.Random('another_string'); [r.randrange(10) for _ in range(20)]
[8, 7, 1, 8, 3, 8, 6, 1, 6, 5, 5, 3, 3, 6, 6, 3, 8, 5, 8, 4]

It can be convenient, e.g. to use the basename of an input file as seed. For the same input file, the generated numbers will always be the same.

Comments

7

Forcing Python's built-in hash to be deterministic is intrinsically hacky. If you want to avoid hackitude, use a different hashing function -- see e.g in Python-2: https://docs.python.org/2/library/hashlib.html, and in Python-3: https://docs.python.org/3/library/hashlib.html

2 Comments

Isn't a hash supposed to be deterministic ?
hash() is only deterministic throughout the same run, you have no guarantee it will return the same hash in different runs. Hence it's bad for persistence on disk.
2

We could use several of the hashlib hashing algorithms. Here is an example with sha1:

import hashlib

context = "string"
sha1 = hashlib.sha1()
sha1.update(str.encode(context))
hash_as_hex = sha1.hexdigest()
# convert the hex back to int and restrict it to the relevant int range
seed = int(hash_as_hex, 16) % 4294967295  

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.