0

I'm going to translate the working matlab code for reading the binary file to python code. Is there an equivalent for

% open the file for reading
fid=fopen (filename,'rb','ieee-le');
% first read the signature
tmp=fread(fid,2,'char');
% read sizes
rows=fread(fid,1,'ushort');
cols=fread(fid,1,'ushort');
1

1 Answer 1

1

there's the struct module to do that, specifically the unpack function which accepts a buffer, but you'll have to read the required size from the input using struct.calcsize

import struct
endian = "<"  # little endian
with open(filename,'rb') as f:
    tmp = struct.unpack(f"{endian}cc",f.read(struct.calcsize("cc")))
    tmp_int = [int.from_bytes(x,byteorder="little") for x in tmp]
    rows = struct.unpack(f"{endian}H",f.read(struct.calcsize("H")))[0]
    cols = struct.unpack(f"{endian}H",f.read(struct.calcsize("H")))[0]

you might want to use the struct.Struct class for reading the rest of the data in chunks, as it is going to be faster than decoding numbers one at a time. ie:

data = []
reader = struct.Struct(endian + "i"*cols)  # i for integer
row_size = reader.size
for row_number in range(rows):
    row = reader.unpack(f.read(row_size))
    data.append(row)

Edit: corrected the answer, and added an example for larger chuncks.

Edit2: okay, more improvement, assuming we are reading 1 GB file of shorts, storing it as python int makes no sense and will most likely give an out of memory error (or system will freeze), the proper way to do it is using numpy

import numpy as np
data = np.fromfile(f,dtype=endian+'H').reshape(cols,rows)  # ushorts

this way it'll have the same space in memory as it did on disk.

Sign up to request clarification or add additional context in comments.

8 Comments

It shows me an error: tmp = unpack_from(f"{endian}c",f)[0] TypeError: a bytes-like object is required, not '_io.BufferedReader'
@SerTet , i corrected the answer, sorry for that.
e.g. in Matlab tmp = [1;0], your python code return me tmp = b'\x01'
@SerTet i didn't see the 2 in the matlab code, i just added it in, can you understand the above code ? or are you just copy-pasting ? the above code is just an example, you should read the documentation if you intend on getting much further from there.
they refer to the system endianess, since you are reading a char, the endianness is not really important, but for a short, it determines the order of the bytes, ie: on a little endian the first 8 bits are the most significant, while on a big endian the first 8 bits are the least significant, because hardware manufacturers couldn't agree on which should go first, so just read it the same way you wrote it, ie: le stands for little endian.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.