7

[Python 3.1]

I am trying to create a hash for a container that may have nested containers in it, with unknown depth. At all levels of the hierarchy, there are only built-in types. What's a good way to do that?

Why I need it:

I am caching the result of some calculations in a pickle object (on disk). I would need to invalidate that cached file if the function is called with different parameters (this happens rarely, so I'm not going to save more than one file to disk). The hash will be used to compare the parameters.

5
  • Are you expecting these values to be able to change after you create them? Commented Nov 17, 2010 at 6:39
  • @aaronsterling: Can you clarify please? I'm not sure I'm answering your question, but the containers or their contents will not be modified. However, I have to emphasize that the persistence is, of course, required for any future invocation of this Python program (in a new process). Commented Nov 17, 2010 at 6:45
  • So you're OK with a small chance that the file might not be invalidated by different function parameters? Any hash function will have a chance of collisions, so you're looking for one with a minimal collision chance. Just trying to understand the problem a bit better. Commented Nov 17, 2010 at 17:23
  • @Scott Griffiths Yes I'm OK with a small chance of this problem happening. Although I guess this suggests serialization is a safer approach than hash when the parameters are not huge. Commented Nov 18, 2010 at 0:36
  • Related: stackoverflow.com/questions/64344515/… Commented Sep 19, 2022 at 7:43

3 Answers 3

2

If all the containers are tuples, and all the contained objects are hashable, then the main container should be hashable.

Sign up to request clarification or add additional context in comments.

2 Comments

Good point. Unfortunately, among the containers are dictionaries.
You could convert dictionaries to a sorted tuple of (key,value) tuples.
1

You could just serialize the parameters into something like JSON, and use that for the hash.

4 Comments

I like this idea. Are there any hidden problems with that? E.g., would it break on some complicated nesting structures, etc?
No, so long as it's just tuples, lists, dicts, strs and ints you're fine with it. One caveat, however: tuple == list (as JavaScript doesn't have a tuple concept)
I think I'd use pickle.dumps instead of json.dumps; the pickle is precise with types and ~15% faster (I timed them).
@ChrisMorgan I just realized I asked a related question 2 years later... And a comment and an answer to it said that I should not use pickle because it's both unpredictable for non-sorted objects and slower than manual conversion to hashable objects. One day I'll time different approaches.
0

I would do it with json serialization as a string [and then hash that string if it's still necessary].

from simplejson import dumps

def hash_data(data):
    return hash(dumps(data))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.