Numpy can be used to read/write binary data. You just need to define a custom np.dtype instance that defines the memory layout of your c-struct.
For example, here is some C++ code defining a struct (should work just as well for C structs, though I'm not a C expert):
struct MyStruct {
uint16_t FieldA;
uint16_t pad16[3];
uint32_t FieldB;
uint32_t pad32[2];
char FieldC[4];
uint64_t FieldD;
uint64_t FieldE;
};
void write_struct(const std::string& fname, MyStruct h) {
// This function serializes a MyStruct instance and
// writes the binary data to disk.
std::ofstream ofp(fname, std::ios::out | std::ios::binary);
ofp.write(reinterpret_cast<const char*>(&h), sizeof(h));
}
Based on the advice I found at stackoverflow.com/a/5397638, I've included some padding in the struct (the pad16 and pad32 fields) so that serialization will happen in a more predictable way. I think that this is a C++ thing; it might not be necessary when using plain ol' C structs.
Now, in python, we create a numpy.dtype object describing the memory-layout of MyStruct:
import numpy as np
my_struct_dtype = np.dtype([
("FieldA" , np.uint16 , ),
("pad16" , np.uint16 , (3,) ),
("FieldB" , np.uint32 ),
("pad32" , np.uint32 , (2,) ),
("FieldC" , np.byte , (4,) ),
("FieldD" , np.uint64 ),
("FieldE" , np.uint64 ),
])
Then use numpy's fromfile to read the binary file where you've saved your c-struct:
# read data
struct_data = np.fromfile(fpath, dtype=my_struct_dtype, count=1)[0]
FieldA = struct_data["FieldA"]
FieldB = struct_data["FieldB"]
FieldC = struct_data["FieldC"]
FieldD = struct_data["FieldD"]
FieldE = struct_data["FieldE"]
if FieldA != expected_value_A:
raise ValueError("Bad FieldA, got %d" % FieldA)
if FieldB != expected_value_B:
raise ValueError("Bad FieldB, got %d" % FieldB)
if FieldC.tobytes() != b"expc":
raise ValueError("Bad FieldC, got %s" % FieldC.tobytes().decode())
...
The count=1 argument in the above call np.fromfile(..., count=1) is so that the returned array will have only one element; this means "read the first struct instance from the file". Note that I am indexing [0] to get that element out of the array.
If you have appended the data from many c-structs to the same file, you can use fromfile(..., count=n) to read n struct instances into a numpy array of shape (n,). Setting count=-1, which is the default for the np.fromfile and np.frombuffer functions, means "read all of the data", resulting in a 1-dimensional array of shape (number_of_struct_instances,).
You can also use the offset keyword argument to np.fromfile to control where in the file the data read will begin.
To conclude, here are some numpy functions that will be useful once your custom dtype has been defined:
- Reading binary data as a numpy array:
np.frombuffer(bytes_data, dtype=...):
Interpret the given binary data (e.g. a python bytes instance)
as a numpy array of the given dtype. You can define a custom
dtype that describes the memory layout of your c struct.
np.fromfile(filename, dtype=...):
Read binary data from filename. Should be the same result as
np.frombuffer(open(filename, "rb").read(), dtype=...).
- Writing a numpy array as binary data:
ndarray.tobytes():
Construct a python bytes instance containing
raw data from the given numpy array. If the array's data has dtype
corresponding to a c-struct, then the bytes coming from
ndarray.tobytes can be deserialized
by c/c++ and interpreted as an (array of) instances of that c-struct.
ndarray.tofile(filename):
Binary data from the array is written to filename.
This data could then be deserialized by c/c++.
Equivalent to open("filename", "wb").write(a.tobytes()).