14

I am a newbie to the python. Can I unhash, or rather how can I unhash a value. I am using std hash() function. What I would like to do is to first hash a value send it somewhere and then unhash it as such:

#process X
hashedVal = hash(someVal)
#send n receive in process Y
someVal = unhash(hashedVal)
#for example print it
print someVal

Thx in advance

4
  • are you talking about serialisation? Commented Jun 9, 2010 at 14:21
  • 3
    Why do you want to do this? Are you trying to speed up sending 3GB data by only sending the hash and then unhashing it at the other end? It's not going to work... Commented Jun 9, 2010 at 14:23
  • If this method is viable, many cryptic systems would be dead. Commented Jun 9, 2010 at 15:17
  • hey, google has a logo almost in rainbow colors, you can use it! :) Commented Jun 9, 2010 at 15:28

4 Answers 4

31

It can't be done.

A hash is not a compressed version of the original value, it is a number (or something similar ) derived from the original value. The nature of hash implementations is that it is possible (but statistically unlikely if the hash algorithm is a good one) that two different objects produce the same hash value.

This is known as the Pigeonhole Principle which basically states that if you have N different items, and want to place them into M different categories, where the N number is larger than M (ie. more items than categories), you're going to end up with some categories containing multiple items. Since a hash value is typically much smaller in size than the data it hashes, it follows the same principles.

As such, it is impossible to go back once you have the hash value. You need a different way of transporting data than this.

For instance, an example (but not a very good one) hash algorithm would be to calculate the number modulus 3 (ie. the remainder after dividing by 3). Then you would have the following hash values from numbers:

1 --> 1  <--+- same hash number, but different original values
2 --> 2     |
3 --> 0     |
4 --> 1  <--+

Are you trying to use the hash function in this way in order to:

  • Save space (you have observed that the hash value is much smaller in size than the original data)
  • Secure transportation (you have observed that the hash value is difficult to reverse)
  • Transport data (you have observed that the hash number/string is easier to transport than a complex object hierarchy)

... ?

Knowing why you want to do this might give you a better answer than just "it can't be done".

For instance, for the above 3 different observations, here's a way to do each of them properly:

  • Compression/Decompression, for instance using gzip or zlib (the two typically available in most programming languages/runtimes)
  • Encryption/Decryption, for instance using RSA, AES or a similar secure encryption algorithm
  • Serialization/Deserialization, which is code built to take a complex object hierarchy and produce either a binary or textual representation that later on can be deserialized back into new objects
Sign up to request clarification or add additional context in comments.

2 Comments

In python the __hash__ method is independent of any hash-table you create. Suppose that you have a hash table named tabby. Then hash("hello world") is not the same as x % len(tabby) for some number x. When you hash a string, or hash a tuple, or hash anything in python, the resulting number has nothing to do with the number of entries in a user-created hash-table.
There are a lot of python libraries which define the == operator (or __eq__() ) based on __hash__(). For example, we could have "hello" == "world" return True if and only if hash("hello") == hash("world"). If the hash value for each instance of the str class is unique, then it is theoretically possible to un-hash them.
17

Even if I'm almost 8 years late with an answer, I want to say it is possible to unhash data (not with the std hash() function though).

The previous answers are all describing cryptographic hash functions, which by design should compute hashes that are impossible (or at least very hard to unhash).

However, this is not the case with all hash functions.

Solution

You can use basehash python lib (pip install basehash) to achieve what you want.

There is an important thing to keep in mind though: in order to be able to unhash the data, you need to hash it without loss of data. This generally means that the bigger the pool of data types and values you would like to hash, the bigger the hash length has to be, so that you won't get hash collisions.

Anyway, here's a simple example of how to hash/unhash data:

import basehash

hash_fn = basehash.base36()  # you can initialize a 36, 52, 56, 58, 62 and 94 base fn
hash_value = hash_fn.hash(1) # returns 'M8YZRZ'
unhashed = hash_fn.unhash('M8YZRZ') # returns 1

You can define the hash length on hash function initialization and hash other data types as well.

I leave out the explanation of the necessity for various bases and hash lengths to the readers who would like to find out more about hashing.

Comments

7

You can't "unhash" data, hash functions are irreversible due to the pigeonhole principle

http://en.wikipedia.org/wiki/Hash_function
http://en.wikipedia.org/wiki/Pigeonhole_principle

I think what you are looking for encryption/decryption. (Or compression or serialization as mentioned in other answers/comments.)

1 Comment

Or compression/decompression.
0

This is not possible in general. A hash function necessarily loses information, and python's hash is no exception.

1 Comment

In python, there are a lot of classes which define operator == (or __eq__() ) based on __hash__(). For example, we could have "hello" == "world" return True if and only if hash("hello") == hash("world"). If the hash value for each string is unique, then it is theoretically possible to un-hash them.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.