1

I'm going through the python.org's python tutorial, at the moment. I'm on 10.9 and I am trying to use the zlib library to compress a string. However, the len(compressedString) isn't always less than the len(originalString). My interpreter code is below:

>>> import zlib
>>> s = 'the quick brown fox jumps over the lazy dog'
>>> len(s)
43
>>> t = zlib.compress(s)
>>> len(t)
50
>>> t
'x\x9c+\xc9HU(,\xcdL\xceVH*\xca/\xcfSH\xcb\xafP\xc8*\xcd-(V\xc8/K-R(\x01J\xe7$VU*\xa4\xe4\xa7\x03\x00a<\x0f\xfa'
>>> len(zlib.decompress(t))
43
>>> s2 = "something else i'm compressing"
>>> len(s2)
30
>>> t2 = zlib.compress(s2)
>>> len(t2)
37
>>> s3 = "witch which has which witches wrist watch"
>>> len(s3)
41
>>> t3 = zlib.compress(s3)
>>> len(t3)
37

Does anyone know why this is happening?

2 Answers 2

11

The zlib compression algorithm is not always efficient:

>>> len(zlib.compress('ab'))
10

because it needs to add metadata (headers, symbol tables, backreferences) that could amount to more data than what you tried to compress. Use it on longer, not-so-random data and it'll compress things just fine:

>>> lorem = 'Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit'
>>> len(lorem) * 100
9100
>>> len(zlib.compress(lorem * 100))
123
Sign up to request clarification or add additional context in comments.

Comments

2

However, the len(compressedString) isn't always less than the len(originalString).

That would, of course, be impossible. At least if you expected to always be able to losslessly retrieve the original string.

The deflate algorithm will however never expand by more than a small percentage, plus six bytes for the zlib header and trailer. The zlib header identifies it as a zlib stream, and the trailer provides an integrity check on the data.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.