O
odigity
I'm writing a script that needs to run in as fast a time as possible.
Every minute counts. The script crawls a tree of gzipped files
totalling about 30gb. Originally I was calling open() with "gzip
$file |", but I hate making external calls - it requires a fork, and
you have very limited communication with the process for catching
errors and such. I always like using perl functions and modules when
possible over external calls. However, I wanted to make sure I
wouldn't take a performance hit before switching to Compress::Zlib.
I picked one of the bigger files (75mb) and ran some benchmarking on
it, comparing Compress::Zlib to an exernal call to the gzip utility.
Here's the code:
#!/usr/bin/perl -w
use strict;
use Benchmark qw( cmpthese );
use Compress::Zlib;
use IO::File;
my $file = 'sample.gz';
print "warming up the file...\n";
system( "zcat $file > /dev/null" );
print "starting comparison...\n";
cmpthese( 3, {
'ext_gzip' => \&ext_gzip,
'compress_zlib' => \&compress_zlib,
});
sub ext_gzip
{
my $fh = IO::File->new( "gzip -cd $file |" ) or die( "could not gzip
-cd '$file' for reading: $!" );
my $lines = 0;
while ( defined(my $line = $fh->getline()) ) {
$lines++;
}
$fh->close();
print "ext_gzip: $lines lines\n";
}
sub compress_zlib
{
my $gz = gzopen( $file, 'rb' ) or die( "could not gzopen '$file' for
reading: $!" );
my $line;
my $lines = 0;
while ( ( my $bytes = $gz->gzreadline( $line ) ) > 0 ) {
die( $gz->gzerror ) if ( $bytes == -1 );
$lines++;
}
$gz->gzclose();
print "compress_zlib: $lines lines\n";
}
Here's the output:
warming up the file...
starting comparison...
compress_zlib: 15185003 lines
compress_zlib: 15185003 lines
compress_zlib: 15185003 lines
(warning: too few iterations for a reliable count)
ext_gzip: 15185003 lines
ext_gzip: 15185003 lines
ext_gzip: 15185003 lines
(warning: too few iterations for a reliable count)
s/iter compress_zlib ext_gzip
compress_zlib 68.6 -- -23%
ext_gzip 52.8 30% --
Now, this wasn't the best possible benchmarking test, but I still
think I am justified in being concerned.
Any help in either a) interpreting these results, b) suggesting better
benchmarking methods, c) explaining why Compress::Zlib is slower than
gzip, and most importantly, d) how to improve performance, would be
appreciated.
-ofer
Every minute counts. The script crawls a tree of gzipped files
totalling about 30gb. Originally I was calling open() with "gzip
$file |", but I hate making external calls - it requires a fork, and
you have very limited communication with the process for catching
errors and such. I always like using perl functions and modules when
possible over external calls. However, I wanted to make sure I
wouldn't take a performance hit before switching to Compress::Zlib.
I picked one of the bigger files (75mb) and ran some benchmarking on
it, comparing Compress::Zlib to an exernal call to the gzip utility.
Here's the code:
#!/usr/bin/perl -w
use strict;
use Benchmark qw( cmpthese );
use Compress::Zlib;
use IO::File;
my $file = 'sample.gz';
print "warming up the file...\n";
system( "zcat $file > /dev/null" );
print "starting comparison...\n";
cmpthese( 3, {
'ext_gzip' => \&ext_gzip,
'compress_zlib' => \&compress_zlib,
});
sub ext_gzip
{
my $fh = IO::File->new( "gzip -cd $file |" ) or die( "could not gzip
-cd '$file' for reading: $!" );
my $lines = 0;
while ( defined(my $line = $fh->getline()) ) {
$lines++;
}
$fh->close();
print "ext_gzip: $lines lines\n";
}
sub compress_zlib
{
my $gz = gzopen( $file, 'rb' ) or die( "could not gzopen '$file' for
reading: $!" );
my $line;
my $lines = 0;
while ( ( my $bytes = $gz->gzreadline( $line ) ) > 0 ) {
die( $gz->gzerror ) if ( $bytes == -1 );
$lines++;
}
$gz->gzclose();
print "compress_zlib: $lines lines\n";
}
Here's the output:
warming up the file...
starting comparison...
compress_zlib: 15185003 lines
compress_zlib: 15185003 lines
compress_zlib: 15185003 lines
(warning: too few iterations for a reliable count)
ext_gzip: 15185003 lines
ext_gzip: 15185003 lines
ext_gzip: 15185003 lines
(warning: too few iterations for a reliable count)
s/iter compress_zlib ext_gzip
compress_zlib 68.6 -- -23%
ext_gzip 52.8 30% --
Now, this wasn't the best possible benchmarking test, but I still
think I am justified in being concerned.
Any help in either a) interpreting these results, b) suggesting better
benchmarking methods, c) explaining why Compress::Zlib is slower than
gzip, and most importantly, d) how to improve performance, would be
appreciated.
-ofer