0

I have a very large array of structs (more than 100k structs) that have to be saved to a file. Later, these have to be loaded and processed one at a time. The current approach is to save it using just save. This takes ~8s to save and ~100s to load.

I've tried a couple of ways to speed this up:

  1. Using the -v6 flags with save. This sped things up, but not significantly.

  2. Serializing and deserializing using getByteStreamFromArray() and getArrayFromByteStream() respectively. This had no effect. Specifically, serializing and deserializing took just as long as simply saving and loading.

  3. (still working on this) Serializing the array, saving it, loading it, then only deserializing each structure as it is processed (rather than the whole array)

Does anyone have any recommendations to improve performance in this situation? It seems like it would be a common problem.

3
  • 1
    It may not be an option, but you could consider playing with how you store data in the structs. For example, arrays of structs with many fields will be less efficient than a single struct with arrays for fields. In general, anything that groups the data into contiguous blocks in memory should be faster. Also, if you only need to load/update a subset of your variables, just load/save what you need instead of everything. Commented Jun 11, 2016 at 22:11
  • @user20160 That last option is something I'm looking into. The entire array needs to be saved, and loading part of an array the recommended way (matfile) is taking just as long (just to call matfile). That's why I'm looking into serializing it, then deserializing only the needed parts (loading in the entire array as a vector of uint8s is almost instantaneous) Commented Jun 11, 2016 at 22:17
  • If this is an option for your case, you may want to convert your arrays of struct into simple arrays and back before the load/save operations. Commented Jun 12, 2016 at 16:55

1 Answer 1

1

I believe that getByteStreamFromArray() and getArrayFromByteStream() are used by save() and load() under the hood, so your results are not very surprising to me. You might get better performance using hand-crafted serialization functions that crawl down your structs and only save what's really needed. Additional saving can possibly be achieved by compressing the saved data. You can read some implementation ideas here: http://undocumentedmatlab.com/blog/improving-save-performance

Note #1 - YMMV based on Matlab release, data, and platform

Note #2 - for readers who are not aware of this, getByteStreamFromArray() and getArrayFromByteStream() are both undocumented Matlab functions. The only [unofficial] explanation of their behavior, AFAIK, is provided here: http://undocumentedmatlab.com/blog/serializing-deserializing-matlab-data

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.