I am working on parsing Matlab structured arrays in Python. For simplicity, the data structure ultimately consists of 3 fields, say header, body, trailer. Creating some data in Matlab for example:
header_data = {100, 100, 100};
body_data = {1234, 100, 4321};
trailer_data = {1001, 1001, 1001};
data = struct('header', header_data, 'body', body_data, 'trailer', trailer_data);
yields a 1x3 struct array.
This data is then read in Python as follows:
import scipy.io as sio
import numpy as np
matlab_data = sio.loadmat('data.mat', squeeze_me=True)
data = matlab['data']
This makes data a 1-dimensional numpy.ndarray of size 3 with dtype=dtype([('header', 'O'), ('body', 'O'), ('trailer', 'O')]), which I can happily iterate through using numpy.nditer and extract and parse the data from each struct.
The problem I'm trying to overcome is that unfortunately (and out of my control) in some of the files I need to parse, the above defined struct arrays are themselves a member of another struct array with a field msg. Continuing with my example in Matlab:
messages = struct('msg', {data(1), data(2), data(3)});
When this is loaded with scipy.loadmat in Python, it results in a 1-dimensional numpy.ndarray of size 3 with dtype=dtype([('msg', 'O')]). In order to reuse the same function for parsing the data fields, I'd need to have logic to detect the msg field, if it exists, and then extract each numpy.void from there before calling the function to parse the individual header, body and trailer fields.
In Matlab, this is easily overcome because the original 1x3 struct array with three fields can be extracted from the 1x3 struct array with the single msg field by doing: [messages.msg], which yields a 1x3 struct array with the header, body and trailer fields. If I try to translate this to numpy, the following command gives me a view of the original numpy.ndarray, which is not a structure (dtype=dtype('O')).
I'm trying to figure out if there an analogous way with numpy to recover the struct array with three fields from the one with the single msg field, as I can do in Matlab, or if I truly need to iterate over each value and manually extract it from the msg field before using a common parsing function. Again, the format of the Matlab input files is out of my control and I cannot change them; and my example here is only trivial compared to the number of nested fields I need to extract from the Matlab data.