zipped socket

J

John

Is there anyway open a socket so that every send/listen/recv
goes thru a zipping/unzipping process automatically?

Thanks,
--j
 
J

jepler

As far as I know, there is not a prefabbed solution for this problem. One
issue that you must solve is the issue of buffering (when must some data you've
written to the compressor really go out to the other side) and the issue of
what to do when a read() or recv() reads gzipped bytes but these don't produce any
additional unzipped bytes---this is a problem because normally a read() that
returns '' indicates end-of-file.

If you only work with whole files at a time, then one easy thing to do is use
the 'zlib' encoding: 'abc'
... but because zlib isn't self-delimiting, this won't work if you want to
write() multiple times, or if you want to read() less than the full file

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFC90dSJd01MZaTXX0RAlxtAKCYInTY85Hkbw1HUxGRAcIVQgnAPACgkQ8D
Qm3hbkJFwW7BZ1J34zd/4eE=
=E6gg
-----END PGP SIGNATURE-----
 
P

Peter Hansen

John said:
Is there anyway open a socket so that every send/listen/recv
goes thru a zipping/unzipping process automatically?

You ought to be able to do this easily by wrapping a bz2 compressor
around the socket (maybe using socket.makefile() to return a file object
first) and probably using a generator as well:

http://effbot.org/librarybook/bz2.htm includes relevant examples (not
specifically with sockets though).

Googling for "python incremental compression" ought to turn up any other
alternatives.

-Peter
 
B

Bryan Olson

> As far as I know, there is not a prefabbed solution for this problem. One
> issue that you must solve is the issue of buffering (when must some data you've
> written to the compressor really go out to the other side) and the issue of
> what to do when a read() or recv() reads gzipped bytes but these don't produce any
> additional unzipped bytes---this is a problem because normally a read() that
> returns '' indicates end-of-file.
>
> If you only work with whole files at a time, then one easy thing to do is use
> the 'zlib' encoding:
> 'abc'
> ... but because zlib isn't self-delimiting, this won't work if you want to
> write() multiple times, or if you want to read() less than the full file

That's basically a solved problem; zlib does have a kind of
self-delimiting. The key is the 'flush' method of the
compression object:

some_send_function( compressor.flush(Z_SYNC_FLUSH) )

The Python module doc is unclear/wrong on this, but zlib.h
explains:

If the parameter flush is set to Z_SYNC_FLUSH, all pending
output is flushed to the output buffer and the output is
aligned on a byte boundary, so that the decompressor can get
all input data available so far.


There's also Z_FULL_FLUSH, which also re-sets the compression
dictionary. For a stream socket, we'd usually want to keep the
dictionary, since that's what gives us the compression. The
Python doc states:

Z_SYNC_FLUSH and Z_FULL_FLUSH allow compressing further
strings of data and are used to allow partial error recovery
on decompression

That's not correct. Z_FULL_FLUSH allows recovery after errors,
but Z_SYNC_FLUSH is just to allow pushing all the compressor's
input to the decompressor's output.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,262
Messages
2,571,311
Members
47,986
Latest member
ColbyG935

Latest Threads

Top