Gzip each chunk separately

L

Lior Knaany

Hi all,

I need some help understanding chunked & gzipped data in HTTP/1.1
protocol.
Using headers like "Content-Encoding" vs. "Transfer-Encoding".
(doing this in order to develop a web server filter)

I noticed that when the server sends a Gzip content in chunks the
response headers will be as so :

"Content-Encoding: gzip
Transfer-Encoding: Chunked"

The browser waits for all the chunks, concates them together & runs
GUnZip on them to get the content.


But why Gzip the entire data before sending ? Is there a way that the
server can Gzip the chunk & then send it (doing the same for all the
chunks)?
Meaning the Gzip will not be on the entire content all together, but
for each chunk.
This way the browser could read one chunk, GUnZip it, display the
result & continue to the next chunk.

If there is a way, what should the response headers look like ?
Maybe like this: "Transfer-Encoding: Gzip,Chunked" with no
Content-Encoding header?

I have searched "RFC 2616 - Hypertext Transfer Protocol -- HTTP/1.1
" but could not find any meaningful information for this question.

Please help,

Thanks in advance,
Lior.
 
B

Barry Margolin

Lior Knaany said:
But why Gzip the entire data before sending ? Is there a way that the
server can Gzip the chunk & then send it (doing the same for all the
chunks)?
Meaning the Gzip will not be on the entire content all together, but
for each chunk.
This way the browser could read one chunk, GUnZip it, display the
result & continue to the next chunk.

Unless the chunks are really big, you're not going to get very good
compression that way. Gzip uses an adaptive compression algorithm, so
it gets better as the amount of data increases.

But since gzip is also a stream compression algorithm, it can be done on
the fly as each chunk is sent and received.
 
L

Lior Knaany

Thanks Barry,

I know that Gzip will work poorly on a smaller content, but can it be
done (gzip on each chunk seperatly)?
& if so, what should the headers look like ?
 
C

Chris Smith

Lior Knaany said:
I know that Gzip will work poorly on a smaller content, but can it be
done (gzip on each chunk seperatly)?
& if so, what should the headers look like ?

No, it can't be done. (Or rather, if you do it then general-purpose
browsers won't understand.)

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
L

Lior Knaany

Thanks Chris,

That is exactly what I am experiencing when producing such a page,
I just thought, maybe I am doing something wrong with the headers.

Well thanks again for the info Chris.
 
M

Michael Wojcik

No, it can't be done. (Or rather, if you do it then general-purpose
browsers won't understand.)

Though as Barry pointed out, you can achieve essentially the same
effect; neither the sender nor the receiver need buffer all the data
and compress or decompress it at once, since gzip is a streaming
compressor.

There's nothing to stop the server from reading N bytes of the file
it's sending, initializing the compressor, compressing those N bytes
to M bytes, sending an M-byte chunk, reading the next N bytes,
compressing those without reinitializing the compressor, and so
forth. The receiver can treat that just as it would a content-body
that was compressed in its entirety before chunking. The only
difference, as far as the receiver can tell, is that the chunks will
probably vary in size if the sender compresses each chunk in turn.

By the same token, the receiver can initialize the decompressor
before processing the first chunk, then pass it each chunk as it's
received. It needn't buffer the entire compressed content-body.
 
R

Rogan Dawes

Chris said:
No, it can't be done. (Or rather, if you do it then general-purpose
browsers won't understand.)

In fact, the gzip algorithm allows for indepently gzipped content to be
concatenated, and it will still unzip just fine.


$ echo file 1 > file1
$ echo file 2 > file2
$ gzip file1 file2
$ cat file1.gz file2.gz > file3.gz
$ gunzip file3.gz
$ cat file3
file 1
file 2
$

So, if you created a gzipped stream by concatenating gzipped output, the
browser SHOULD read it as the concatenation of the uncompressed files.

Regards,

Rogan
 
C

Chris Smith

Rogan Dawes said:
$ echo file 1 > file1
$ echo file 2 > file2
$ gzip file1 file2
$ cat file1.gz file2.gz > file3.gz
$ gunzip file3.gz
$ cat file3
file 1
file 2
$

Interesting...

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
C

Chris Uppal

[irrelevant and/or non-existent x-postings trimmed]

Rogan said:
In fact, the gzip algorithm allows for indepently gzipped content to be
concatenated, and it will still unzip just fine.

More accurately, the gzip /program/ will act as you describe. The compressed
format itself, the GZIP format as specified in RFC 1952, does naturally
concatenate, but only in the sense that a file in that format consists of a
number of elements, each of which is an independently compressed "file" (the
format even includes an embedded file name!).

It's difficult to state how a browser should interpret a gzip-format stream
which consists of several compressed elements. If the browser's decompression
is based on the zlib library, then that library does not automatically hide the
boundaries between the separate "files" in the stream (and nor should it), so
it is quite possible -- even probable -- that the browser would stop
decompressing at the end of the first compressed "file" in the stream.

OTOH (reverting to the original poster's question), I don't see any reason why
the server cannot send chunked and compressed data, nor any reason (except,
perhaps, convenience) why the browser should not decompress such data
incrementally. The underlying compression format (shared by "GZIP" and
"DEFLATE") is capable of being flushed and/or reset in mid-stream, so the
server could flush the compression algorithm at the end of each chunk, and that
would be transparent to the browser as it was decompressing it (assuming the
use of a library at least as well-designed as zlib).

In point of fact, however, I'm not sure I see any real reason why the server
should even bother to flush the compression algorithm -- it could just
accumulate compressed data until it had enough for one chunk (possibly leaving
some data in the compression code's buffers). Send that as one chunk. The
client would decompress in the same incremental way.

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top