2

I'm writing a script that needs to check the md5 sum of a file on OSX and Windows, and as a sanity check I compared the results with that of the command line md5 tool, but I get different results. Here's the code

def MD5File(self, f, block_size=2**20):
  md5 = hashlib.md5()
  while True:
    data = f.read(block_size)
    if not data:
      break
    md5.update(data)
  return md5.hexdigest()

with open(path, 'rb') as f:
  print MD5File(path)

I did the obvious thing of opening the file in binary mode, but it still gives different results. I've tried different ways of buffering the data, including just reading it all in one go, and the python script consistently returns the same thing, but that's different to the md5 command.

So is there something else really obvious I'm doing wrong, or is it a case that running md5 filename doesn't actually do what you expect? As I'm reading the binary of the file directly there shouldn't be any newline issues. If I run cat filename | md5 then I get a different result again.

4
  • I don't think opening the file in binary mode is going to change the fact that unix and windows use different line endings in files, unless this is not a text file, in which case my comment is irrelevant. Did you check the line endings? Commented Feb 28, 2011 at 11:20
  • Assuming print MD5File(path) and not print MD5File(f) is just a typo in your posting... it works fine for me. Commented Feb 28, 2011 at 11:26
  • Gah! Found my problem: I had overridden the md5 command in the shell to return the hash of a string rather than the file, so I wasn't actually running the md5 program at all. Commented Feb 28, 2011 at 11:28
  • Ouch! Nice catch! And a nice presentation on why you shouldn't override common names ;-) Commented Feb 28, 2011 at 11:39

2 Answers 2

4

The following works correctly for me:

In [1]: with file("play.py") as f:
   ...:     data = f.read()
   ...:     from hashlib import md5
   ...:     print(md5(data).hexdigest())
   ...: 
07030b37de71f3ad9ef2398b4f0c3a3e

In [2]: 
bensonk@angua ~ $ md5 play.py
MD5 (play.py) = 07030b37de71f3ad9ef2398b4f0c3a3e

Please try my code and see if it works for you. If it doesn't, will you upload a gist of your python script and a sample file for me to try?

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Benson -- that was pretty much my original implementation.
0

Oops, this was a case of user error. I had overriden the md5 command in the shell to just return the hash of a string rather than a file.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.