Zlib: correct checksum but error decompressing

Andre · Aug 26, 2009

I have been trying to solve this issue for a while now. I receive data
from a TCP connection which is compressed. I know the correct checksum
for the data and both the client and server generate the same
checksum. However, in Python when it comes to decompressing the data I
get the exception: "Error -5 while decompressing data"! I would assume
that if the string in python is equivalent to the correct checksum
than the decompress function should also work on the same string, but
that's clearly not the case.

# convert data to a byte array
data = array('b', raw_data)
# print checksum for visual inspection
print zlib.crc32(data.tostring())
# try to decompress, but fails!
str = zlib.decompress(data.tostring())

Does anyone know what's going on?

InvisibleRoads Patrol · Aug 26, 2009

I have been trying to solve this issue for a while now. I receive data
from a TCP connection which is compressed. I know the correct checksum
for the data and both the client and server generate the same
checksum. However, in Python when it comes to decompressing the data I
get the exception: "Error -5 while decompressing data"! I would assume
that if the string in python is equivalent to the correct checksum
than the decompress function should also work on the same string, but
that's clearly not the case.

# convert data to a byte array
data = array('b', raw_data)
# print checksum for visual inspection
print zlib.crc32(data.tostring())
# try to decompress, but fails!
str = zlib.decompress(data.tostring())

Does anyone know what's going on?

Hi Andre,

Hmm. Can you decompress the string on the server before it was sent?
Maybe the zipfile or gzip module will work.
Reference:
http://bytes.com/topic/python/answers/42131-zlib-decompress-cannot-gunzip-can
from cStringIO import StringIO
from gzip import GzipFile
body = GzipFile('', 'r', 0, StringIO(raw_data)).read()

You might want to try experimenting with the wbits parameter of
zlib.decompress()
Reference:
http://mail.python.org/pipermail/python-list/2008-December/691694.html
zlib.decompress(data, -15)

The zlib module seems to work fine with both strings and byte arrays.
import array, zlib
dataAsString = zlib.compress('example string')
dataAsArray = array.array('b', dataAsString)
zlib.decompress(dataAsString) == zlib.decompress(dataAsArray)
zlib.decompress(dataAsString) == zlib.decompress(dataAsArray.tostring())

Paul Rubin · Aug 26, 2009

Andre said:
I have been trying to solve this issue for a while now. I receive data
from a TCP connection which is compressed.

Are you sure it is compressed with zlib? If yes, does it include the
standard zlib header? Some applications save a few bytes by stripping
the header. See the zlib doc page for how to deal with that, there is
a flag that causes the header check to be skipped on decompression if
you pass a negative number. That's the first thing I would try.

John Machin · Aug 27, 2009

Paul Rubin said:
Are you sure it is compressed with zlib? If yes, does it include the
standard zlib header? Some applications save a few bytes by stripping
the header. See the zlib doc page for how to deal with that, there is
a flag that causes the header check to be skipped on decompression if
you pass a negative number. That's the first thing I would try.

Short answer:

Try this:
zlib.decompress(incoming_data, -15)
If that doesn't work:
print repr(incoming_data[:30])
# post the results here

Longer answer:

A zlib stream consists of a deflate stream preceded by
a 2-byte header and followed by a 4-byte Adler32
checksum of the original data.

The problem occurs not out of a desire to save 6 bytes
but through compounding of 2 mistakes:

Mistake (1) is in the HTTP protocol.
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html
The "deflate" content coding should have been called "zlib".
Read this and weep:
"""deflate The "zlib" format defined in RFC 1950 [31] in
combination with the "deflate" compression mechanism
described in RFC 1951 [29]."""

Mistake (2) happens when software implementers read only
the first word of the above quote and provide only a
deflate stream.

A reader can handle both possibilities by checking for a
(usual, default) zlib header:

data[0] == '\x78' and (ord(data[1]) + 0x7800) % 31 == 0

HTH,
John

zlib and zip files	0	Apr 14, 2006
zlib && zip files	0	Apr 14, 2006
zlib.decompress fails, zlib.decompressobj succeeds - bug or feature?	4	May 9, 2010
I made a blockchain and want to make a cryptocurrency, but my code doesn't verify hash of each block	2	Jun 2, 2024
zlib decode fails with -5	2	Sep 27, 2005
Trouble sending / receiving compressed data (using zlib) as HTTP POSTto server (in django)	2	Oct 3, 2009
bz2.decompress as file handle	0	May 19, 2014
Decompression with zlib	0	Apr 8, 2009

Zlib: correct checksum but error decompressing

Andre

InvisibleRoads Patrol

Paul Rubin

John Machin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads