2

I'm trying to use Python to loop over a long binary file filled with 8-byte records.

Each record has the format [ uint16 | uint16 | uint32 ]
(which is "HHI" in struct-formatting)

Apparently each 8-byte block is getting treated as an int, instead of an array of 8-bytes, then causing the struct.unpack call to fail

with open(fname, "rb") as f:
    sz=struct.calcsize("HHI")
    print(sz)                # This shows 8, as expected 
    for raw in f.read(sz):   # Expect this should read 8 bytes into raw
        print(type(raw))     # This says raw is an 'int', not a byte-array
        record=struct.unpack("HHI", raw ) # "TypeError: a bytes-like object is required, not 'int'"
        print(record)

How can I read my file as a series of structures, and print them each out?

3
  • 1
    I think f.read(len) is not iterable Commented Mar 4, 2019 at 18:31
  • Don't you just want raw = f.read(len)? This gives you all eight bytes at once, which seems to be what you want. Commented Mar 4, 2019 at 18:36
  • Mostly, yes: I want the first 8bytes, then iterate to get the next 8, and the following 8, etc, until the full file has been processed. Commented Mar 4, 2019 at 18:39

4 Answers 4

4

The iter builtin, if passed a callable and a sentinel value will call the callable repeatedly until the sentinel value is returned.

So you can create a partial function with functools.partial (or use a lambda) and pass it to iter, like this:

with open('foo.bin', 'rb') as f:
    chunker = functools.partial(f.read, 8)
    for chunk in iter(chunker, b''):      # Read 8 byte chunks until empty byte returned
        # Do stuff with chunk
Sign up to request clarification or add additional context in comments.

Comments

3

f.read(len) only returns a byte string. Then raw will be a single byte.

The correct way of looping is:

with open(fname, 'rb') as f:
    while True:
        raw = f.read(8)
        if len(raw)!=8:
            break # ignore the incomplete "record" if any
        record = struct.unpack("HHI", raw )
        print(record)

2 Comments

What if there are exactly 8 bytes in the final chunk?
@Rimer It will be processed as normal and break in the next iteration. This question specified that we are reading "a long binary file filled with 8-byte records..."
0

I've never used this before, but it looks like an initialization issue:

   with open(fname, "rb") as f:
        fmt = 'HHI'
        raw=struct.pack(fmt,1,2,3)
        len=struct.calcsize(fmt)
        print(len)               # This shows 8, as expected 
        for raw in f.read(len):  # Expect this should read 8 bytes into raw
            print(type(raw))     # This says raw is an 'int', not a byte-array
            record=struct.unpack(fmt, raw ) # "TypeError: a bytes-like object is required, not 'int'"
            print(record)

You may want to look at iter_unpack() for optimization if you have adequate ram.

Note that in 3.7, the default value changes from bytes to string. see near end of page https://docs.python.org/3/library/struct.html#struct.pack

Comments

0

You can also do this using the walrus operator (:=), and I find that more concise and readable:

fname = '/tmp/foobar.txt'
size = 2

with open(fname, 'rb') as fp:
    while chunk := fp.read(size):
        print(chunk)
echo 'foobar' > /tmp/foobar.txt

python iter-chunks.py

b'fo'
b'ob'
b'ar'
b'\n'

This implements the solution the OP asked for:

I want the first 8bytes, then iterate to get the next 8, and the following 8, etc, until the full file has been processed

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.