1

The standard hash('hello') function may generate different hashes on different machines, different versions of Python, and even different runs of the same program on the same Python version/machine.

What are decent pure Python (or built in) alternatives that have low chances of collision? The use case I'm after is checking uniqueness of a string in a database (note: it doesn't have to be 100% foolproof, just extremely unlikely to collide with another string).

3

2 Answers 2

1

Try the python hashlib. It has implementations of MD5, SHA1, SHA224, SHA256, SHA384, and SHA512 which should give you a good hash with low chance of collision.

Example Input:

import hashlib
hashed_string = hashlib.sha512("hello").hexdigest()
print hashed_string

Output:

9b71d224bd62f3785d96d46ad3ea3d73319bfbc2890caadae2dff72519673ca72323c3d99ba5c11d7c7acc6e14b8c5da0c4663475c2e5c3adef46f73bcdec043
Sign up to request clarification or add additional context in comments.

2 Comments

Note that these are cryptographic & slow hashes. The OP's question is to the default hash which is in a different domain.
I'm not going to be doing thousands of these per second so I think hashlib will do, but yes I would have preferred to find a pure Python solution that was not cryptographic. I'll also look into Murmurhash.
1

Murmurhash is a good bet for non-cyrptographic uses (as opposed to hashlib, which uses slow crypto-hashes), for a variety of reasons:

  • very widely used

  • portable across not only Python versions and machines, but across different languages

Here is its Python bindings. Here is a question on the algorithm itself.


If, for some technical reason, a pure-Python roll-your-own implementation might be needed, there are a few feasible options:

  • the source code for murmurhash is in the first link above, but, in pure Python mode, should probably be replaced by something simpler

  • immediate candidates are the Knuth and Jenkins.

5 Comments

Thanks. Is there a package for it that will run in an environment that doesn't allow C modules?
I'm not sure. What is an "enviornment that doesn't allow C modules": a technical or organizational environment? It depends...
App Engine runs (I believe) a modified version of pypy with a lot of restrictions due to their sandbox. Basically, I can't run anything compiled, it has to be written entirely in Python.
See update. Hope it helps.
Thanks. Do you know how they compare to FNV-1a in terms of random distribution and collision rates?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.