3

First import:

import pandas as pd
import numpy as np
import hashlib

Next, consider the following:

np.random.seed(42)
arr = np.random.choice([41, 43, 42], size=(3,3))
df = pd.DataFrame(arr)
print(arr)
print(df)
print(hashlib.sha256(arr.tobytes()).hexdigest())
print(hashlib.sha256(df.values.tobytes()).hexdigest())

Multiple executions of this snippet yield the same hash twice all the time: ddfee4572d380bef86d3ebe3cb7bfa7c68b7744f55f67f4e1ca5f6872c2c9ba1.

However, if we consider the following:

np.random.seed(42)
arr = np.random.choice(['foo', 'bar', 42], size=(3,3))
df = pd.DataFrame(arr)
print(arr)
print(df)
print(hashlib.sha256(arr.tobytes()).hexdigest())
print(hashlib.sha256(df.values.tobytes()).hexdigest())

Note that there are strings in the data now. The hash of the arr is fixed (52db9328682317c44370b8186a5c6bae75f2a94c9d0d5b24d61f602857acd3de) for different evaluations, but the one of the pandas.DataFrame changes each time.

Any pythonic way around it? No Pythonic?

Edit: Related links:

4 Answers 4

2

A pandas DataFrame or Series can be hashed using the pandas.util.hash_pandas_object function, starting in version 0.20.1.

Sign up to request clarification or add additional context in comments.

Comments

0

According to me when you are using string as values for your cells. Data frame type is object

df.dtypes

shows that. That is why you get different hash each time.

Comments

0

Naive workaround is to get a string representation of the whole dataframe and hash it. In particular either of the following can work:

print(hashlib.sha256(df.to_json().encode()).hexdigest())
print(hashlib.sha256(df.to_csv().encode()).hexdigest())

Naturally, this is going to be very length for big dataframes.

Still, the it remains that pd.DataFrame(arr).values != arr, and this is counter-intuitive.

See a summary: https://gist.github.com/drorata/bfc5d956c4fb928dcc77510a33009691

1 Comment

0

I wrote a package with hashable subclasses of Series and DataFrame for my needs. Hope this helps.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.