Reading binary file. Translate matlab to python

Question

I'm going to translate the working matlab code for reading the binary file to python code. Is there an equivalent for

% open the file for reading
fid=fopen (filename,'rb','ieee-le');
% first read the signature
tmp=fread(fid,2,'char');
% read sizes
rows=fread(fid,1,'ushort');
cols=fread(fid,1,'ushort');

The more or less equivalent is numpy.org/doc/stable/reference/generated/numpy.fromfile.html You just have to get the corresponding dtypes: numpy.org/doc/stable/reference/arrays.dtypes.html — max9111
– max9111, Commented Oct 18, 2022 at 8:21

Ahmed AEK · Accepted Answer · 2022-10-17 13:32:25Z

1

there's the struct module to do that, specifically the unpack function which accepts a buffer, but you'll have to read the required size from the input using struct.calcsize

import struct
endian = "<"  # little endian
with open(filename,'rb') as f:
    tmp = struct.unpack(f"{endian}cc",f.read(struct.calcsize("cc")))
    tmp_int = [int.from_bytes(x,byteorder="little") for x in tmp]
    rows = struct.unpack(f"{endian}H",f.read(struct.calcsize("H")))[0]
    cols = struct.unpack(f"{endian}H",f.read(struct.calcsize("H")))[0]

you might want to use the struct.Struct class for reading the rest of the data in chunks, as it is going to be faster than decoding numbers one at a time. ie:

data = []
reader = struct.Struct(endian + "i"*cols)  # i for integer
row_size = reader.size
for row_number in range(rows):
    row = reader.unpack(f.read(row_size))
    data.append(row)

Edit: corrected the answer, and added an example for larger chuncks.

Edit2: okay, more improvement, assuming we are reading 1 GB file of shorts, storing it as python int makes no sense and will most likely give an out of memory error (or system will freeze), the proper way to do it is using numpy

import numpy as np
data = np.fromfile(f,dtype=endian+'H').reshape(cols,rows)  # ushorts

this way it'll have the same space in memory as it did on disk.

edited Oct 17, 2022 at 13:32

answered Oct 17, 2022 at 8:06

Ahmed AEK

23.2k3 gold badges19 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

SerTet Over a year ago

It shows me an error: tmp = unpack_from(f"{endian}c",f)[0] TypeError: a bytes-like object is required, not '_io.BufferedReader'

Ahmed AEK Over a year ago

@SerTet , i corrected the answer, sorry for that.

SerTet Over a year ago

e.g. in Matlab tmp = [1;0], your python code return me tmp = b'\x01'

Ahmed AEK Over a year ago

@SerTet i didn't see the 2 in the matlab code, i just added it in, can you understand the above code ? or are you just copy-pasting ? the above code is just an example, you should read the documentation if you intend on getting much further from there.

Ahmed AEK Over a year ago

they refer to the system endianess, since you are reading a char, the endianness is not really important, but for a short, it determines the order of the bytes, ie: on a little endian the first 8 bits are the most significant, while on a big endian the first 8 bits are the least significant, because hardware manufacturers couldn't agree on which should go first, so just read it the same way you wrote it, ie: le stands for little endian.

|

Collectives™ on Stack Overflow

Reading binary file. Translate matlab to python

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related