0

I am in a huge hashing crisis. Using the chip-0007's default format I generatedfew JSON files. Using these files I have been trying to generate sha256 hash value. And I expect a unique hash value for each file.

However, python code isn't doing so. I thought there might be some issue with JSON file but, it is not. Something is to do with sha256 code.

All the json files ->

JSON File 1

{ "format": "CHIP-0007", "name": "adewale-the-amebo", "description": "Adewale always wants to be in everyone's business.", "attributes": [ { "trait_type": "Gender", "value": "male" } ], "collection": { "name": "adewale-the-amebo Collection", "id": "1" } }

JSON File 2

{ "format": "CHIP-0007", "name": "alli-the-queeny", "description": "Alli is an LGBT Stan.", "attributes": [ { "trait_type": "Gender", "value": "male" } ], "collection": { "name": "alli-the-queeny Collection", "id": "2" } }

JSON File 3

{ "format": "CHIP-0007", "name": "aminat-the-snnobish", "description": "Aminat never really wants to talk to anyone.", "attributes": [ { "trait_type": "Gender", "value": "female" } ], "collection": { "name": "aminat-the-snnobish Collection", "id": "3" } }

Sample CSV File:

Series Number,Filename,Description,Gender
1,adewale-the-amebo,Adewale always wants to be in everyone's business.,male
2,alli-the-queeny,Alli is an LGBT Stan.,male
3,aminat-the-snnobish,Aminat never really wants to talk to anyone.,female

Python CODE

TODO 2 : Generate a JSON file per entry in team's sheet in CHIP-0007's default format

                new_jsonFile = f"{row[1]}.json"
                json_data = {}

                json_data["format"] = "CHIP-0007"
                json_data["name"] = row[1]
                json_data["description"] = row[2]

                attribute_data = {}
                attribute_data["trait_type"] = "Gender"  # gender
                attribute_data["value"] = row[3]  # "value/male/female"

                json_data["attributes"] = [attribute_data]

                collection_data = {}
                collection_data["name"] = f"{row[1]} Collection"
                collection_data["id"] = row[0]  # "ID of the NFT collection"

                json_data["collection"] = collection_data

                filepath = f"Json_Files/{new_jsonFile}"
                with open(filepath, 'w') as f:
                    json.dump(json_data, f, indent=2)
                    C += 1
                    sha256_hash = sha256_gen(filepath)
                    temp.append(sha256_hash)

                    NEW.append(temp)


# TODO 3 : Calculate sha256 of the each entry
def sha256_gen(fn):
    return hashlib.sha256(open(fn, 'rb').read()).hexdigest()

How can I generate a unique sha256 hash for each JSON?

I tried reading in byte blocks. That is also not working out. After many trials, I am going nowhere. Sharing the unexpected outputs of each JSON file:

[ All hashes are identical ]

Unexpected SHA256 output:

e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

Expected:

Unique Hash value. Different from each other

7
  • 1
    I would be surprised if the hashlib library is so fundamentally broken that you can accidentally generate SHA256 collisions. It is virtually certain that you are feeding the same input to the hash function, even if you think you are not. Commented Nov 4, 2022 at 15:52
  • 1
    I can't reproduce the problem. Are you sure new_jsonFile is changing each time? Commented Nov 4, 2022 at 15:54
  • 1
    Please post the code that loops over all the files. Commented Nov 4, 2022 at 15:55
  • @JohnColeman No each input is different Commented Nov 4, 2022 at 16:15
  • @Barmar yes, I have edited the question. Now, complete code along with sample csv and json files are in there Commented Nov 4, 2022 at 16:16

1 Answer 1

2

Because of output buffering, you're calling sha256_gen(filepath) before anything is written to the file, so you're getting the hash of an empty file. You should do that outside the with, so that the JSON file is closed and the buffer is flushed.

                with open(filepath, 'w') as f:
                    json.dump(json_data, f, indent=2)
                C += 1
                sha256_hash = sha256_gen(filepath)
                temp.append(sha256_hash)

                NEW.append(temp)
Sign up to request clarification or add additional context in comments.

8 Comments

Oh man! such a silly mistake I did. Thank you so much. I should have asked here 2 days ago. Yes, now it's working fine. Much love and blessings
Some simple debugging like printing open(fn, 'rb').read() might have clued you in to the problem.
The same input always gives the same hash. Empty input is always the same.
Hashing is supposed to be predictable. For instance, if you download a file and want to confirm that it wasn't modified, you hash the file and compare that with the hash published on the website. How would that work if your hash was different from when they posted the file?
If you're thinking of password hashing, that's made unpredictable by adding a random salt to the password before hashing it.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.