0

I am trying to convert data file (here string representing file with three lines) into a structure array like this:

cel = textscan('1 1.1 2 2.2 3 3.3', '%u %f');
str = cell2struct(cel, {'f1', 'f2'}, 2);

However, now I have a struct array of dimension 1x1, where I can only access the columns using array's fields, but not the whole rows (like 'str(2)' for the second row).

What I need is to have an array of structs (or how it can be called) like this:

str = struct('f1', {1, 2, 3}, 'f2', {1.1, 2.2, 3.3});

because now I can (for instance) filter it like this:

subStr = str(find([str.f1] > 1))

which I could not do in the first case. Any idea how to get there? At the end I was able to do it by:

cel = textscan('1 1.1 2 2.2 3 3.3', '%u %f');
[f1, f2] = cel{:};
str = struct('f1', num2cell(f1'), 'f2', num2cell(f2'));

But it does not feel right and I am afraid it will be expensive (the files are quite large).

EDIT:

My solution is indeed too memory demanding, therefore not usable. Typical files have header, footer, and c. 5e6 lines of data in six columns.

Thanks

2
  • Why not use a table for this? It seems like structs would needlessly complicate your life... Commented Jan 23, 2019 at 14:11
  • @Dev-iL I don't mind using table. I know struct from numpy where I find them quite usefull. However as I pointed in comment to your answer, they seem to be memory demanding (and probably overkill for my usecase). Commented Jan 23, 2019 at 17:01

3 Answers 3

1

It's easier if you're actually working with a file that contains lines. For example, if data.txt contains:

1 1.1
2 2.2
3 3.3

And now you can simply load this using:

tbl = readtable('data.txt');
tbl.Properties.VariableNames = {'f1', 'f2'};

Which results in much nicer (imho) filtering syntax:

subTbl = tbl(tbl.f1 > 1, :);

I suggest you read a bit about tables in MATLAB, to learn about their (many) capabilities.


Finally, if you insist on working with struct arrays, you can do:

str = table2struct(tbl); 3×1 struct array with fields: f1 f2
Sign up to request clarification or add additional context in comments.

5 Comments

Thank you for pointing out Tables. However, those files have c. 5e6 lines, plus header and footer. Using textscan I can efficiently process the header, footer and data continuously (it reads from current position in file). Using readtable however this will probably not be that easy (also not sure how to handle footer). But mainly the script get soon terminated due to lack of memory. So I rather stay with textscan...
Well, you didn't specify the size of the input file. In that case, see article on creating tall arrays, and specifically tall tables.
Its not like the array would not fit into memory. In matter of fact it occupy only c.100 MB. What consumes the memory is the processing/rearanging (or whatever). I was actually considering datastores (or something similar from that area) before, but it turned out that simple textscan seemed to be more straightforward considering the structure of the file. Together with following cell2struct it is reasonable fast and memory efficient. And I choose structs because they work just fine for me (only if they would be arranged the way I like). But I will give it a second look...
@rad followup question: are you working with many such files, or is it just the one and you're loading it from text each time? What I mean to say is that maybe it would make sense to preprocess the file once (by even loading it into excel and saving as XLSX, since that would already be simpler for MATLAB to handle - i.e. number saved as numbers and not as text), or of course as a .mat file. It's important to understand which step of the processing you want to focus on making efficient. It might be worth it to perform a costly conversion (once) to gain powerful tools to analyze your data.
You are probably right about the conversion. The current file structure does not allow for easy parsing into some more 'sophisticated' Matlab data structures. So first I will probably focuse on converting the data file into some more easily managable format and then choose a proper way how to handle it for actual processing. And you were right, it probably won't be a struct arrays...
0

Each element of cel is an array. Using cellfun and num2cell they can be converted to cell arrays:

names = {'f1', 'f2'};
cel = textscan('1 1.1 2 2.2 3 3.3', '%u %f');
cel2 = cellfun(@num2cell, cel, 'UniformOutput', 0);

prep = [names;cel2];
str = struct(prep{:}).';

2 Comments

This does indeed work for small files/cells, but for large ones it consumes to much memory (gets terminated in my case). The cell2struct in my case (which however leads to non-optimaly arranged struct array) takes (almost) no time and no memory (since it probably only somehow remap the data in memory). So I was hoping there will be a way how to do it in similar manner...
You may need to design a class overloading the subsref and subsasgn operators. the Underlying structure of it can be a cell array cel or a struct.
0

I wish I would read those more carefully sooner, but according to this and this it is not encouraged to save large datasets the way I was trying to, because

Structures with many fields and small contents have a large overhead and should be avoided. A large array of structures with numeric scalar fields requires much more memory than a structure with fields containing large numeric arrays.

and

For structures and cell arrays, MATLAB creates a header not only for each array, but also for each field of the structure and for each cell of a cell array. Because of this, the amount of memory required to store a structure or cell array depends not only on how much data it holds, but also on how it is constructed.

So therefore array str.f(1:N) requires (for larger N) much more memory than str(1:N).f.

1 Comment

I'm happy that you reached a useful conclusion, however, I feel that this answer is incomplete. You should update it with the solution you will have ended up using, since at the moment it doesn't really answer the question (the future reader will know what they shouldn't do, but not what they should). Your answer will be very useful should you document your comparison of different storage mechanisms (which I suspect you'll investigate anyway). Please take this into consideration.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.