1

I am trying to compress a huge python object ~15G, and save it on the disk. Due to requrement constraints I need to compress this file as much as possible. I am presently using zlib.compress(9). My main concern is the memory taken exceeds what I have available on the system 32g during compression, and going forward the size of the object is expected to increase. Is there a more efficient/better way to achieve this. Thanks.

Update: Also to note the object that I want to save is a sparse numpy matrix, and that I am serializing the data before compressing, which also increases the memory consumption. Since I do not need the python object after it is serialized, would gc.collect() help?

0

2 Answers 2

5

Incremental (de)compression should be done with zlib.{de,}compressobj() so that memory consumption can be minimized. Additionally, higher compression ratios can be attained for most data by using bz2 instead.

Sign up to request clarification or add additional context in comments.

Comments

0

The memLevel parameter of deflateInit2 () specifies how much memory should be allocated for the internal compression state. The default is 8, the maximum is 9 and the minimum is 1 (see the zlib manual). If you've already tried that or it doesn't help you enough, it might be necessary to look at another compression algorithm or library instead.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.