I have a some sets of binary files (some are potentially large (100MB)) that contain 4 byte integers.
Can anyone supply a code snippet to show how to extract each 4 byte integer until the end of the file is reached? Using Python 2.7.
Thanks
You could use struct.unpack():
with open(filename, 'rb') as fileobj:
for chunk in iter(lambda: fileobj.read(4), ''):
integer_value = struct.unpack('<I', chunk)[0]
This uses <I to interpret the bytes as little-endian unsigned integers. Adjust the format as needed; > for big-endian, i for signed integers.
If you need to read a lot of integer values in one go and know how many you need to read, take a look at the array module as well:
from array import array
arr = array('L')
with open(filename, 'rb') as fileobj:
arr.fromfile(fileobj, number_of_integers_to_read)
where you'd need to use array.byteswap() if the endianess of the file and your system didn't match:
if sys.byteorder != 'little':
arr.byteswap()
array.fromfile. You could put it in a while True: loop inside try: .fromfile.. except EOFError: pass to avoid knowing number_of_integers_to_read before hand.number_of_integers_to_read by reading from file to bytes stream data=fileobj.read() followed by a call to arr.frombytes(data):arr.fromfile(fileobj, sys.maxsize) and just catch the EOFError exception.Check out the NumPy fromfile function. You provide a simple type annotation about the data to be read, and the function efficiently reads it into a NumPy ndarray object.
import numpy as np
np.fromfile(file_name, dtype='<i4')
You can change dtype to reflect size and byte order as well. See here for some examples.