Strange speed-increase by separating "if"s

Wolfram Humann · Jul 21, 2005

I have a script for processing certain eps-files. What it basically does
is going through the file looking for "setgray"-lines. If it finds one,
it checks if it's followed by lines matching the values in @head and
then @dot (plus some coordinate-checks). If all matches, @head remains
in the file while @dot is discarded. If the match fails, the file
remains unchanged. Here is the script:

#!/usr/local/bin/perl -w
use strict;

my @head = (
'^N(\s+\d+)(\s+\d+)(\s+\d+) 0 360 arc sf N$',
'^\d+\s+slw$',
);
my @dot = (
'^(\d+)\s+(\d+)\s+M$',
('^(\d+)\s+(\d+)\s+D$') x 72,
);

my @c_head = map qr/$_/, @head;
my @c_dot = map qr/$_/, @dot;
my ($x, $y, $r);

while(<>)
{
print;
if(/^[0-9.]+\s+setgray\s+$/)
{
my ($h, $d) = (0,0);
my $l = "";
while(<>)
{
if ($head[$h])
{
print;
last unless $_ =~ $c_head[$h];
($x, $y, $r) = ($1, $2, $3) if defined $3;
$h++;
}
elsif ($dot[$d])
{
$l .= $_;
if ($_ !~ /$c_dot[$d]/ or
$1 < $x - $r - 2 or
$1 > $x + $r + 2 or
$2 < $y - $r - 2 or
$2 > $y + $r + 2 )
{
print $l;
last;
}
$d++;
}
else
{
print $l if /^\d+\s+\d+\s+D$/;
print;
last;
}
}
}
}

I was surprised how long it took and profiled with Devel::SmallProf.
This showed that most time is spent in "if (($_ !~ /$c_dot[$d]/) or...".
To see if the pattern match or the comparisons took so long, I split the
"if" like this:

if ($_ !~ /$c_dot[$d]/)
{
if ($1 < $x - $r - 2 or
$1 > $x + $r + 2 or
$2 < $y - $r - 2 or
$2 > $y + $r + 2 )
{
print $l;
last;
}
}

To my surprise a test run (without the profiler) ran *at least* twice as
fast as the original version. Also, the profiler says that "no time" is
spent for the comparisons and the time for the pattern match dropped to
one third of what is was before.

Any explanation?
Thanks,
Wolfram

Anno Siegel · Jul 21, 2005

Wolfram Humann said:
I have a script for processing certain eps-files. What it basically does
is going through the file looking for "setgray"-lines. If it finds one,
it checks if it's followed by lines matching the values in @head and
then @dot (plus some coordinate-checks). If all matches, @head remains
in the file while @dot is discarded. If the match fails, the file
remains unchanged. Here is the script:

#!/usr/local/bin/perl -w
use strict;

my @head = (
'^N(\s+\d+)(\s+\d+)(\s+\d+) 0 360 arc sf N$',
'^\d+\s+slw$',
);
my @dot = (
'^(\d+)\s+(\d+)\s+M$',
('^(\d+)\s+(\d+)\s+D$') x 72,
);

my @c_head = map qr/$_/, @head;
my @c_dot = map qr/$_/, @dot;
my ($x, $y, $r);

while(<>)
{
print;
if(/^[0-9.]+\s+setgray\s+$/)
{
my ($h, $d) = (0,0);
my $l = "";
while(<>)
{
if ($head[$h])
{
print;
last unless $_ =~ $c_head[$h];
($x, $y, $r) = ($1, $2, $3) if defined $3;
$h++;
}
elsif ($dot[$d])
{
$l .= $_;
if ($_ !~ /$c_dot[$d]/ or
$1 < $x - $r - 2 or
$1 > $x + $r + 2 or
$2 < $y - $r - 2 or
$2 > $y + $r + 2 )
{
print $l;
last;
}
$d++;
}
else
{
print $l if /^\d+\s+\d+\s+D$/;
print;
last;
}
}
}
}

I was surprised how long it took and profiled with Devel::SmallProf.
This showed that most time is spent in "if (($_ !~ /$c_dot[$d]/) or...".
To see if the pattern match or the comparisons took so long, I split the
"if" like this:

if ($_ !~ /$c_dot[$d]/)
{
if ($1 < $x - $r - 2 or
$1 > $x + $r + 2 or
$2 < $y - $r - 2 or
$2 > $y + $r + 2 )
{
print $l;
last;
}
}

The logic of the modified part is different from the original, and
makes no sense. You are making sure the regex *doesn't* match, and
then go on to use $1 and $2. In the original, you use them when the
regex *does* match. If you had warnings switched on, you'd have
noticed.

To my surprise a test run (without the profiler) ran *at least* twice as
fast as the original version.

Different control flow, different runtimes, so no surprise here.

Make your program strict- and warnings-safe and profile again, preferably
with programs that do the same thing. When benchmarking and profiling,
an important step is to monitor your variants to make sure that you
haven't built in a bug, as has happened to you.

Anno

Wolfram Humann · Jul 21, 2005

-----Original Message-----
From: Anno Siegel
Sent: 21.07.2005 12:22

Wolfram Humann said:
Wolfram Humann said:

I have a script for processing certain eps-files. What it basically does
is going through the file looking for "setgray"-lines. If it finds one,
it checks if it's followed by lines matching the values in @head and
then @dot (plus some coordinate-checks). If all matches, @head remains
in the file while @dot is discarded. If the match fails, the file
remains unchanged. Here is the script:

#!/usr/local/bin/perl -w
use strict;

my @head = (
'^N(\s+\d+)(\s+\d+)(\s+\d+) 0 360 arc sf N$',
'^\d+\s+slw$',
);
my @dot = (
'^(\d+)\s+(\d+)\s+M$',
('^(\d+)\s+(\d+)\s+D$') x 72,
);

my @c_head = map qr/$_/, @head;
my @c_dot = map qr/$_/, @dot;
my ($x, $y, $r);

while(<>)
{
print;
if(/^[0-9.]+\s+setgray\s+$/)
{
my ($h, $d) = (0,0);
my $l = "";
while(<>)
{
if ($head[$h])
{
print;
last unless $_ =~ $c_head[$h];
($x, $y, $r) = ($1, $2, $3) if defined $3;
$h++;
}
elsif ($dot[$d])
{
$l .= $_;
if ($_ !~ /$c_dot[$d]/ or
$1 < $x - $r - 2 or
$1 > $x + $r + 2 or
$2 < $y - $r - 2 or
$2 > $y + $r + 2 )
{
print $l;
last;
}
$d++;
}
else
{
print $l if /^\d+\s+\d+\s+D$/;
print;
last;
}
}
}
}

I was surprised how long it took and profiled with Devel::SmallProf.
This showed that most time is spent in "if (($_ !~ /$c_dot[$d]/) or...".
To see if the pattern match or the comparisons took so long, I split the
"if" like this:

if ($_ !~ /$c_dot[$d]/)
{
if ($1 < $x - $r - 2 or
$1 > $x + $r + 2 or
$2 < $y - $r - 2 or
$2 > $y + $r + 2 )
{
print $l;
last;
}
}

Click to expand...

The logic of the modified part is different from the original, and
makes no sense. You are making sure the regex *doesn't* match, and
then go on to use $1 and $2. In the original, you use them when the
regex *does* match. If you had warnings switched on, you'd have
noticed.

Of course you're right. I had the idea that splitting an "if" in two
parts like that was obvious (when indeed it was just stupid). I do have
warnings and strict on, but with the file I use, the inner "if" is never
hit...
If my brain isn't totally drained already the correctly split version
should be:

if ($_ =~ /$c_dot[$d]/)
{
if ($1 < $x - $r - 2 or
$1 > $x + $r + 2 or
$2 < $y - $r - 2 or
$2 > $y + $r + 2)
{
print $l;
last;
}
}
else
{
print $l;
last;
}

Naturally, with this the timing is more or less back to what it was.
According to profiling, the four comparisons take about as much time as
the pattern match -- hence the speed increase when they never executed

Thanks for the help!

Big and Blue · Jul 21, 2005

Wolfram Humann wrote:
t...

If my brain isn't totally drained already the correctly split version
should be:

if ($_ =~ /$c_dot[$d]/)
{
if ($1 < $x - $r - 2 or
$1 > $x + $r + 2 or
$2 < $y - $r - 2 or
$2 > $y + $r + 2)
{

Naturally, with this the timing is more or less back to what it was.
According to profiling, the four comparisons take about as much time as
the pattern match -- hence the speed increase when they never executed

You are (probably) doing 4 calculations to work out constants.

Might be quicker if you did:

$x_less_r2 = $x - $r - 2;
$x_plus_r2 = $x + $r + 2;
$y_less_r2 = $y - $r - 2;
$y_plus_r2 = $y + $r + 2;

(having declared them in a suitable scope) as soon as you get x and y then
change your test to use the calculated version. Of course, it depends on
how many times the test is run for each head branch taken (and whether Perl
optimizes this anyway).

Wolfram Humann · Jul 22, 2005

-----Original Message-----
From: Big and Blue
Sent: 22.07.2005 00:22

Wolfram Humann wrote:
t...

If my brain isn't totally drained already the correctly split version
should be:

if ($_ =~ /$c_dot[$d]/)
{
if ($1 < $x - $r - 2 or
$1 > $x + $r + 2 or
$2 < $y - $r - 2 or
$2 > $y + $r + 2)
{

Naturally, with this the timing is more or less back to what it was.
According to profiling, the four comparisons take about as much time
as the pattern match -- hence the speed increase when they never
executed

Click to expand...

You are (probably) doing 4 calculations to work out constants.

Might be quicker if you did:

$x_less_r2 = $x - $r - 2;
$x_plus_r2 = $x + $r + 2;
$y_less_r2 = $y - $r - 2;
$y_plus_r2 = $y + $r + 2;

(having declared them in a suitable scope) as soon as you get x and y
then change your test to use the calculated version. Of course, it
depends on how many times the test is run for each head branch taken
(and whether Perl optimizes this anyway).

Might be worth a try. The comparisons are done up to 73 times with the
same x / y / r. Thanks for the suggestion!

Trying to build a SARIMAX model to forecast the S&P500 trend	0	Nov 5, 2023
Blue J Ciphertext Program	2	Nov 22, 2023
Dont work, it´s something whit the loops?	1	Jun 30, 2021
Need help with this script	4	Mar 12, 2023
Rock, Paper, Scissor game. Im getting TypeError, unsupported operand type(s) for -=: 'NoneType' and 'int'	2	Aug 29, 2023
How to speed this code	3	Nov 16, 2022
My Status, Ciphertext	2	Nov 28, 2023
Drawing missing in bitmap in a pure C win32 program	4	Jun 3, 2023

Strange speed-increase by separating "if"s

Wolfram Humann

Anno Siegel

Wolfram Humann

Big and Blue

Wolfram Humann

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads