Compression ratio for numpy array

Question

Given a numpy array, is there a script that can be written, purely in python, that returns its compression ratio?

This is a very simple, specific problem that I can't seem to come up with a good solution for without manually making use of the file system.

Note that making use of the compressed file itself is irrelevant to this problem. The answer need only pertain to the compression ratio value.

You mean a numpy array? Or a numpy file? Are we talking about file size? or memory size? — willeM_ Van Onsem
– willeM_ Van Onsem, Commented Jun 16, 2017 at 12:17
The question seems too vague. For example, suppose a = np.array([1, 2, 0, 3, 4, 0, 5]). Does it even make sense to ask for the "compression ratio" of a? If you have something more specific in mind, please update the question. — Warren Weckesser
– Warren Weckesser, Commented Jun 16, 2017 at 12:41
@WarrenWeckesser I'm using numpy arrays of of shape (28 x 28 x 3), they represent images. I don't think that matters though - np.array([1,1,1,1,1,1,1]) is more compressible than np.array([1, 2, 0, 3, 4, 0, 5]) for example. — Shoogiebaba
– Shoogiebaba, Commented Jun 16, 2017 at 16:42
@WillemVanOnsem I suppose I'm referring to a .npy file, unless there is a way to create compressed python objects in memory and directly gauge how much memory they take up compared to the original array. — Shoogiebaba
– Shoogiebaba, Commented Jun 16, 2017 at 16:44
There are many compression algorithms; the comression ratio will depend on the algorithm. It sounds like you want to know how much the array will be compressed when it is written to a compressed npz file, without actually creating the file. — Warren Weckesser
– Warren Weckesser, Commented Jun 16, 2017 at 17:25

Ed Smith · Accepted Answer · 2017-06-16 14:29:52Z

As numPy uses c, I don't think a pure python solution is possible but you can avoid the file system using stringIO. Using numpy built in functions np.savez_compression we can then compare the resulting sizes to np.savez,

import StringIO

def get_compression_ratio(a):

    uncompressed = StringIO.StringIO()
    compressed = StringIO.StringIO()
    np.savez_compressed(compressed, a)
    np.savez(uncompressed, a)

    return uncompressed.len/float(compressed.len)

a = np.zeros([1000,1000])
a[23,60] = 1.
b = np.random.random([1000,1000])

print("one number = ", get_compression_ratio(a), 
      "random = ", get_compression_ratio(b))

with result,

('one number = ', 1001.0255255255255, 'random = ', 1.0604228730260878)

As the random numbers are incompressible, this makes some sense but the array with one non-zero value should be better. The result relies on the algorithm in savez_compression being efficient/correct.

Collectives™ on Stack Overflow

Compression ratio for numpy array

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related