1

How do I go about opening a binary data file in Python and reading back the values one long at a time, into a struct. I have something like this at the moment but I think this will keep overwriting idList, I want to append to it, so I end up with a tuple of all the long values in the file -

file = open(filename, "rb")
    try:
        bytes_read = file.read(struct.calcsize("=l"))
        while bytes_read:
            # Read 4 bytes(long integer)
            idList = struct.unpack("=l", bytes_read)
            bytes_read = file.read(struct.calcsize("=l"))
    finally:
        file.close()

2 Answers 2

6

Simplest (python 2.6 or better):

import array
idlist = array.array('l')
with open(filename, "rb") as f:
    while True:
        try: idlist.fromfile(f, 2000)
        except EOFError: break
idtuple = tuple(idlist)

Tuples are immutable, so they can't be built incrementally: so you have to build a different (mutable) sequence, then call tuple on it at the end. If you don't actually need specifically a tuple, of course, you can save the last, costly step and keep the array or list or whatever. Avoiding trampling over built-in names like file is advisable anyway;-).

If you have to use the struct module for a job that's best handled by the array module (e.g., because of a bet),

idlist = [ ]
with open(filename, "rb") as f:
    while True:
        bytes_read = f.read(struct.calcsize("=l"))
        if not bytes_read: break
        oneid = struct.unpack("=l", bytes_read)[0]
        idlist.append(oneid)

The with statement (also available in 2.5 with an import form the future) is better than the old try/finally in clarity and conciseness.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks. Unfortunately we're limited to using Python 2.5 at the moment, how would this differ in that?
@Adam, just add from __future__ import with_statements at the start of the module.
In the array example you call fromfile with a value of 2000, should that not be 4, for the four byte integer? Or am I misunderstanding this function?
@Adam, .fromfile(f, N) reads up to N items (raising EOFError if it's read less than N due to end-of-file, but you just need to catch that). The array instance already knows that each item takes 4 bytes because it knows it's an array of ls, i.e., 4-byte signed ints. Reading a few thousand items at a time (exact number doesn't matter, 2000 rather than 1000 or 3000 is just because you do have to pick an exact number;-) is more efficient than reading one at a time (not a huge difference in performance, but, "waste not, want not";-).
0

Change

idList = struct.unpack("=l", bytes_read)

to

idList.append(struct.unpack("=l", bytes_read)[0])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.