How to loop over a binary file in Python in chunks

Question

I'm trying to use Python to loop over a long binary file filled with 8-byte records.

Each record has the format [ uint16 | uint16 | uint32 ]
(which is "HHI" in struct-formatting)

Apparently each 8-byte block is getting treated as an int, instead of an array of 8-bytes, then causing the struct.unpack call to fail

with open(fname, "rb") as f:
    sz=struct.calcsize("HHI")
    print(sz)                # This shows 8, as expected 
    for raw in f.read(sz):   # Expect this should read 8 bytes into raw
        print(type(raw))     # This says raw is an 'int', not a byte-array
        record=struct.unpack("HHI", raw ) # "TypeError: a bytes-like object is required, not 'int'"
        print(record)

How can I read my file as a series of structures, and print them each out?

Don't you just want raw = f.read(len)? This gives you all eight bytes at once, which seems to be what you want. — John Gordon
– John Gordon, Commented Mar 4, 2019 at 18:36
Mostly, yes: I want the first 8bytes, then iterate to get the next 8, and the following 8, etc, until the full file has been processed. — abelenky
– abelenky, Commented Mar 4, 2019 at 18:39

snakecharmerb · Accepted Answer · 2019-03-04 18:48:22Z

4

The iter builtin, if passed a callable and a sentinel value will call the callable repeatedly until the sentinel value is returned.

So you can create a partial function with functools.partial (or use a lambda) and pass it to iter, like this:

with open('foo.bin', 'rb') as f:
    chunker = functools.partial(f.read, 8)
    for chunk in iter(chunker, b''):      # Read 8 byte chunks until empty byte returned
        # Do stuff with chunk

answered Mar 4, 2019 at 18:48

snakecharmerb

57.1k13 gold badges136 silver badges200 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

gdlmx · Accepted Answer · 2019-03-04 18:48:07Z

3

f.read(len) only returns a byte string. Then raw will be a single byte.

The correct way of looping is:

with open(fname, 'rb') as f:
    while True:
        raw = f.read(8)
        if len(raw)!=8:
            break # ignore the incomplete "record" if any
        record = struct.unpack("HHI", raw )
        print(record)

edited Mar 4, 2019 at 18:48

answered Mar 4, 2019 at 18:34

gdlmx

6,8691 gold badge25 silver badges44 bronze badges

2 Comments

Rimer Over a year ago

What if there are exactly 8 bytes in the final chunk?

gdlmx Over a year ago

@Rimer It will be processed as normal and break in the next iteration. This question specified that we are reading "a long binary file filled with 8-byte records..."

Harvey · Accepted Answer · 2019-03-04 18:45:06Z

I've never used this before, but it looks like an initialization issue:

   with open(fname, "rb") as f:
        fmt = 'HHI'
        raw=struct.pack(fmt,1,2,3)
        len=struct.calcsize(fmt)
        print(len)               # This shows 8, as expected 
        for raw in f.read(len):  # Expect this should read 8 bytes into raw
            print(type(raw))     # This says raw is an 'int', not a byte-array
            record=struct.unpack(fmt, raw ) # "TypeError: a bytes-like object is required, not 'int'"
            print(record)

You may want to look at iter_unpack() for optimization if you have adequate ram.

Note that in 3.7, the default value changes from bytes to string. see near end of page https://docs.python.org/3/library/struct.html#struct.pack

deeenes · Accepted Answer · 2024-10-20 19:08:04Z

0

You can also do this using the walrus operator (:=), and I find that more concise and readable:

fname = '/tmp/foobar.txt'
size = 2

with open(fname, 'rb') as fp:
    while chunk := fp.read(size):
        print(chunk)

echo 'foobar' > /tmp/foobar.txt

python iter-chunks.py

b'fo'
b'ob'
b'ar'
b'\n'

This implements the solution the OP asked for:

I want the first 8bytes, then iterate to get the next 8, and the following 8, etc, until the full file has been processed

answered Oct 20, 2024 at 19:08

deeenes

4,5965 gold badges47 silver badges61 bronze badges

Collectives™ on Stack Overflow

How to loop over a binary file in Python in chunks

4 Answers 4

Comments

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related