tarfile woes

  • Thread starter Hans-Joachim Widmaier
  • Start date
H

Hans-Joachim Widmaier

Although I've done a bit of ranting before on this, noone seems to have
noticed. I'll try again, hopefully more to the point.

One of the additions in the standard library I liked most is the tarfile
module. This module came in very handy for one of my programs.
Alas, I had to discover that:

- bzip2 compressed files cannot be read from a "fake" (StringIO) file
object, only from real files. This is (imho) unbelievably ugly, as
I have the file already in a string. I really do not want to read it
a second time. Or a third time, when the user finally decides that
she wants the archive actually unpacked (second was TOC listing).

- It does not handle compressed (.Z) archives. Of course there's
noone to blame. The gzip utility (which is used by gnu tar) handles
this ancient algorithm, but apparently, zlib does not. :-(

- TarInfo and ZipInfo (zipfile) objects differ without need.

TarInfo attribute ZipFile attribute

name filename
size file_size
mtime date_time (int, 6-Tuple)

I can see 2 reasons for that: 1. The library is written by a bunch of
different guys at different dates. Everyone's got her own style, and it
shows. 2. The underlying internals ahll get exposed to some degree.
I'm not sure this is good. Yes, I didn't care much about this only months
ago. But when I tried to write that class that read just about anything,
I suddenly found myself writing the same code with a few attribute names
and format strings changed. (btw, I even resent names like 'file_size'.
Why then not 'file_name'?)

Somewhere in the not-so-near future lies the ominous Python 3.0, said to
be incompatible to the current language (to some degree). Does that hold
for the library, too? If yes, wouldn't that be a good time to unify
classes like those *Info? With this big changes the timestamps could also
be made DateTime objects ...

Ok, enough ranting for a day's worth. The remaining big question, of
course, is: "Who's going to do all that?" I'd offer some help if I felt
up to the task (somehow I have great difficulties understanding newer
modules with all those clever tricks. Guess I'm not clever enough. :-().

hjw
 
C

Christos TZOTZIOY Georgiou

- It does not handle compressed (.Z) archives. Of course there's
noone to blame. The gzip utility (which is used by gnu tar) handles
this ancient algorithm, but apparently, zlib does not. :-(

You know there's a gzip module, don't you?
 
G

Gustavo Niemeyer

- bzip2 compressed files cannot be read from a "fake" (StringIO) file
object, only from real files. This is (imho) unbelievably ugly, as
I have the file already in a string. I really do not want to read it
a second time. Or a third time, when the user finally decides that
she wants the archive actually unpacked (second was TOC listing).

- It does not handle compressed (.Z) archives. Of course there's
noone to blame. The gzip utility (which is used by gnu tar) handles
this ancient algorithm, but apparently, zlib does not. :-(
decompress(data) -> decompressed data

Decompress data in one shot. If you want to decompress data
sequentially, use an instance of BZ2Decompressor instead.
Functions that read and write gzipped files.

The user of the file doesn't have to worry about the compression,
but random access is not allowed.
Open a tar archive for reading, writing or appending. Return
an appropriate TarFile class.

mode:
'r' open for reading with transparent compression
'r:' open for reading exclusively uncompressed
'r:gz' open for reading with gzip compression
'r:bz2' open for reading with bzip2 compression
'a' or 'a:' open for appending
'w' or 'w:' open for writing without compression
'w:gz' open for writing with gzip compression
'w:bz2' open for writing with bzip2 compression
'r|' open an uncompressed stream of tar blocks for reading
'r|gz' open a gzip compressed stream of tar blocks
'r|bz2' open a bzip2 compressed stream of tar blocks
'w|' open an uncompressed stream for writing
'w|gz' open a gzip compressed stream for writing
'w|bz2' open a bzip2 compressed stream for writing
 
C

Christos TZOTZIOY Georgiou

decompress(data) -> decompressed data

[snip]

I fell into the same trap; Hans talks about files compressed with
"compress", the old unix compression mechanism (not as old as 'pack',
though), that had an extension of '.Z'.
 
L

Lars Gustaebel

- bzip2 compressed files cannot be read from a "fake" (StringIO) file
object, only from real files.

Much to my regret, this is a limitation of the bz2 module which is used.
- It does not handle compressed (.Z) archives.

This is sad, too, but IMO tolerable for most people.
- TarInfo and ZipInfo (zipfile) objects differ without need.

TarInfo attribute ZipFile attribute

name filename
size file_size
mtime date_time (int, 6-Tuple)

As you observed by yourself ZipInfo's naming scheme is inconsistent and I
didn't find it a good idea to adopt this mindlessly. Also, I found an
integer value for mtime much more versatile than a tuple.
I can see 2 reasons for that: 1. The library is written by a bunch of
different guys at different dates. Everyone's got her own style, and it
shows. 2. The underlying internals ahll get exposed to some degree.

Good observation.

I didn't have the standard library in mind when I started writing tarfile.
I realized the zipfile interface wouldn't suit my ideas, so I decided to
design one I could be happy with. In the months before tarfile's addition
to the stdlib I perceived some people's concerns about this lack of
uniformity, so I added the TarFileCompat class to calm them down. Perhaps
it could serve you too.
Your criticism is justified. And I'm sure that in the future a
standardized and extensible interface to archive manipulation could emerge
*if* people just want it badly enough (and someone does all the work :).

Let's wait and see.
 
H

Hans-Joachim Widmaier

Am Fri, 22 Aug 2003 11:40:13 +0200 schrieb Lars Gustaebel:
Much to my regret, this is a limitation of the bz2 module which is used.

Ooh, haven't noticed that myself. That's really sad.
This is sad, too, but IMO tolerable for most people.

Yes, certainly. Those shouldn't be existent anymore, anyhow. It's just
that I made a test run of my class on a directory full of source archives
and got an exception there.
As you observed by yourself ZipInfo's naming scheme is inconsistent and I
didn't find it a good idea to adopt this mindlessly. Also, I found an
integer value for mtime much more versatile than a tuple.

I guess its stored in the zipfile that way, but this is one of the
internals I'm really not interested in, as an integer value loses
absolutely nothing. Or, even better, have all dates and times uniformly
represented by the new DateTime class.
Your criticism is justified. And I'm sure that in the future a
standardized and extensible interface to archive manipulation could emerge
*if* people just want it badly enough (and someone does all the work :).

I'm afraid the ones who want it ought to be the ones who do the work.
Who else would have any interest in doing it? This is maybe the downside
of free software - nobody gets paid writing the pieces nobody wants to.
Let's wait and see.

I'd rather *do* something. If anything, I would really love to write new
modules/classes instead of ranting. But, as I already tried to explain,
I'm simply not good enough for the job. Which leaves me, of course, with
just one option: take what you get and be grateful.

Despite my current problems with tarfile (btw, there are other
modules/classes that insist on a _filename_ [gdk.pixbuf, i.e.]), it
enabled me to write a little backup tool just when I needed it in almost
no time. So I have reason to be grateful. :)

Waiting-and-seeing'ly yours
hjw
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,739
Latest member
Clint8040

Latest Threads

Top