3

When writing binary files in Python I seem to be missing some bytes. I've tried this with the "write" function and with the "array.tofile" function. Here is some example code:

import zlib, sys, os, array
from struct import unpack
from array import array


inputFile = 'strings.exe'

print "Reading data from: ", inputFile

print 'Input File Size:', os.path.getsize(inputFile)

f = open(inputFile, 'rb')
#compressedDocument = 

document = f.read()
documentArray = array('c', document)
print 'Document Size:', len(documentArray)

copyFile = open( 'Copy of ' + inputFile, 'wb')
documentArray.tofile(copyFile)
#copyFile.write(document)
copyFile.close


print 'Output File Size:', os.path.getsize('Copy of ' + inputFile)

print 'Missing Bytes:', os.path.getsize(inputFile) - os.path.getsize('Copy of ' + inputFile)
f.close()

Gives the following output:

Reading data from:  strings.exe
Input File Size: 136592
Document Size: 136592
Output File Size: 135168
Missing Bytes: 1424

I don't understand why those bytes aren't being written. I've tried this on multiple files with a varying number of missing bytes.

1
  • Could you give me the content of the file 'strings.exe' ? I can't reproduce the problem with my files Commented Jul 7, 2011 at 10:54

2 Answers 2

5

You are not closing your output file before you call os.path.getsize on it. Your 135168 bytes written is 33 x 4096 byte blocks ... try copyFile.close() instead of copyFile.close.

Sign up to request clarification or add additional context in comments.

3 Comments

Will +1 this tomorrow (ran out of available votes today). I referenced this in my own answer.
@mac: Your answer has gained you 55 undeserved points -- why don't you delete it instead of copying my answer into it?
The question was about bytes not been written, and I correctly pointed out that the two files were indeed identical. What I got wrong was the reason of why the size returned by python was wrong (and it was in fact an hypothetical sentence, not a definitive "it's because of this") and I changed it pasting in the right explanation, quoting you and +1'ing your answer. If you are jealous about it I can of course delete that bit of the answer. I simply pasted in thinking it was more useful for future visitors. Let me know if you want me to remove that bit! :)
4

If you actually try to compare the two binary files (if you are under unix you use the cmp command) you will see the two files are identical.

EDIT: As correctly pointed out by John in his answer, the difference in byte size is due to not closing the file before measuring its length. The correct line in the code should be copyFile.close() [invoking the method] instead of copyFile.close [which is the method object].

2 Comments

Yep they are identical. Used cmp and md5sum and they appear the same. Should've really checked with that first. Yes I believe it'll be file metadata related. Thank you
Metadata?? "difference in the descriptors"??

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.