0

I am trying to unpack a file containing over 1 billion bytes that encode integers which are 4 bytes each. So every 4 bytes is a different integer. I obviously need to chunk my code for such a big file. I currently have the following:-

import os
z =os.path.getsize(x)
import struct
with open(x, "rb") as f:
    while True: 
        this_chunk = min(50000000, z)
        data = f.read(this_chunk)
        ints1 = struct.unpack("I" * (this_chunk //4) , data)
        if not data:
            break 
    print(ints1)

I get an error which reads:-

struct.error: unpack requires a bytes object of length 50000000

Could you please help me understand this error and how to fix it? Thank you!

0

1 Answer 1

1

You need to keep track of your chunks read. I'd recommend using expressive variables names instead of x and z. The main problem is on your last read, where you want to read the amount of sizeremaining, not a full chunk. Try this (untested)

filesize = os.path.getsize(x)
chunksread = 0
chunksize = 50000000
sizeremaining = filesize

with open(filename, "rb") as f:
    while sizeremaining > 0:
        this_chunk = min(chunksize, sizeremaining)
        data = f.read(this_chunk)
        ints1 = struct.unpack("I" * (this_chunk //4) , data)
        sizeremaining -= this_chunk
        if not data:
            break 
    print(ints1)
Sign up to request clarification or add additional context in comments.

4 Comments

Did that fix the issue?
Your min function is not correct. You should be comparing min(50000000) with the bytes remaining. Let me modify my answer
Ya, it should. What's it doing?
I just made an indentation error. Thank you for your help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.