1

Given a numpy array, is there a script that can be written, purely in python, that returns its compression ratio?

This is a very simple, specific problem that I can't seem to come up with a good solution for without manually making use of the file system.

Note that making use of the compressed file itself is irrelevant to this problem. The answer need only pertain to the compression ratio value.

6
  • 1
    You mean a numpy array? Or a numpy file? Are we talking about file size? or memory size? Commented Jun 16, 2017 at 12:17
  • 1
    The question seems too vague. For example, suppose a = np.array([1, 2, 0, 3, 4, 0, 5]). Does it even make sense to ask for the "compression ratio" of a? If you have something more specific in mind, please update the question. Commented Jun 16, 2017 at 12:41
  • @WarrenWeckesser I'm using numpy arrays of of shape (28 x 28 x 3), they represent images. I don't think that matters though - np.array([1,1,1,1,1,1,1]) is more compressible than np.array([1, 2, 0, 3, 4, 0, 5]) for example. Commented Jun 16, 2017 at 16:42
  • @WillemVanOnsem I suppose I'm referring to a .npy file, unless there is a way to create compressed python objects in memory and directly gauge how much memory they take up compared to the original array. Commented Jun 16, 2017 at 16:44
  • There are many compression algorithms; the comression ratio will depend on the algorithm. It sounds like you want to know how much the array will be compressed when it is written to a compressed npz file, without actually creating the file. Commented Jun 16, 2017 at 17:25

1 Answer 1

4

As numPy uses c, I don't think a pure python solution is possible but you can avoid the file system using stringIO. Using numpy built in functions np.savez_compression we can then compare the resulting sizes to np.savez,

import StringIO

def get_compression_ratio(a):

    uncompressed = StringIO.StringIO()
    compressed = StringIO.StringIO()
    np.savez_compressed(compressed, a)
    np.savez(uncompressed, a)

    return uncompressed.len/float(compressed.len)

a = np.zeros([1000,1000])
a[23,60] = 1.
b = np.random.random([1000,1000])

print("one number = ", get_compression_ratio(a), 
      "random = ", get_compression_ratio(b))

with result,

('one number = ', 1001.0255255255255, 'random = ', 1.0604228730260878)

As the random numbers are incompressible, this makes some sense but the array with one non-zero value should be better. The result relies on the algorithm in savez_compression being efficient/correct.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.