I'm writing a script that needs to check the md5 sum of a file on OSX and Windows, and as a sanity check I compared the results with that of the command line md5 tool, but I get different results. Here's the code
def MD5File(self, f, block_size=2**20):
md5 = hashlib.md5()
while True:
data = f.read(block_size)
if not data:
break
md5.update(data)
return md5.hexdigest()
with open(path, 'rb') as f:
print MD5File(path)
I did the obvious thing of opening the file in binary mode, but it still gives different results. I've tried different ways of buffering the data, including just reading it all in one go, and the python script consistently returns the same thing, but that's different to the md5 command.
So is there something else really obvious I'm doing wrong, or is it a case that running md5 filename doesn't actually do what you expect? As I'm reading the binary of the file directly there shouldn't be any newline issues. If I run cat filename | md5 then I get a different result again.
print MD5File(path)and notprint MD5File(f)is just a typo in your posting... it works fine for me.md5command in the shell to return the hash of a string rather than the file, so I wasn't actually running the md5 program at all.