how to speed up a string-substitution loop?

U

Uri Guttman

AK> OK, if you insist, we do have a disagreement. I will simply note that
AK> the benchmark module is of little use when the code you're fixing is
AK> so slow as to never finish.

no, it is useful when you think using $_ will save you massive amounts
of time when it doesn't. so that will lead you to better coding style
and to focus on your real slowdown. also profiling modules will help
(even in code that never finishes) to find out where the time is spent.

uri
 
S

sln

Hi,

Looking for suggestions on how to speed up the function below. It's
intended to "re-macroize" the output of make; in other words given a
sequence of command lines generated by make, and a set of make macros,
I need to substitute in the make variables such that "gcc -c -g -O2 -
Wall -DNDEBUG" might become (say) "$(CC) -c $(CFLAGS) $(DFLAGS)", as
much as possible as it was in the original Makefile. The %Variables
hash maps strings to macro names; thus with the above example we would
have

$Variables{'gcc'} = '$(CC)';
$Variables{'-g -O2 -Wall'} = '$(CFLAGS)';
$Variables{'-DNDEBUG'} = '$(DFLAGS)';

Anyway, the function below seems to work but scales very badly and
becomes unusable when enough variables are in use. Any ideas on how to
write it better?

sub varify {
my $word = shift;
$word =~ s%\$%\$\$%g;
for my $substr (keys %Variables) {
while ((my $start = index($word, $substr)) >= 0) {
substr($word, $start, length($substr)) = $Variables{$substr};
}
}
return $word;
}

Thanks,
AK

Just a comment that no where is it written that you can reconstruct
variables from the output of variable substitution.

$(a) = a
$(b) = -b$(a)

cc a.obj $(b) ab.obj ->
cc a.obj -ba ab.obj
----
cc a.obj -ba ab.obj ->
cc $(a).obj -b$(a) $(a)b.obj ->
cc $(a).obj $(b) $(a)b.obj != cc a.obj $(b) ab.obj

Even if you could be guaranteed distinction in the final
output, the order in which you do the reverse substitution
has to start from the first variable defined and progress
to the last defined, ie: FIFO.

This means you can't use a hash, which is random and can't
be fifo. Instead you have to store the pseudo variables and
thier data in an array:

@Variables = (
'$(a)' , 'a',
'$(b)' , '-b$(a)',
);

then read each pair as you progress down the list.

Then, to be complete, you have to repeat the substitution
reconstruction process as many times as the deepest nesting
of the $Variables. This could be acomplished by repeating
until there is no difference between the old and new strings.

But, if you can overcome these hurdles, you might get a
broken up static snapshot of makefile state, which can
dynamically generate multiple states.

In this case it would be:

$(a) = a
$(b) = -b$(a)
cc $(a).obj $(b) $(a)b.obj
(but this was designed to fail)

-sln
 
S

sln

Just a comment that no where is it written that you can reconstruct
variables from the output of variable substitution.

$(a) = a
$(b) = -b$(a)

When checking if the value is in the target,
there must be a check that the value itself
is not a variable name.

'this is the t$(a)rget '
^
Found value 'a' here.
Aviod substitution if its part of a variable name,
enclosed with $().

$target =~ s/ \$\(.*?\) \K | ($value) / defined $1 ? $varname : '' /xeg;

Probably in the simplese cases, this will never happen, but its possible.

-sln
 
S

sln

When checking if the value is in the target,
there must be a check that the value itself
is not a variable name.

'this is the t$(a)rget '
^
Found value 'a' here.
Aviod substitution if its part of a variable name,
enclosed with $().

$target =~ s/ \$\(.*?\) \K | ($value) / defined $1 ? $varname : '' /xeg;
This could be done using regexp, something like this (untested):

use strict;
use warnings;

my ($target,@macros);

##
$target = 'cc a.obj -ba ab.obj';
@macros = (
'$(a)' , 'a',
'$(b)' , '-b$(a)',
'$(c)' , '-c$(a)',
);
print "\n",$target,"\n";
for (my $ndx = 0; $ndx <= $#macros; $ndx+=2) {
print +($macros[$ndx] =~ /\$\((.*?)\)/), " = $macros[$ndx+1]\n";
}
print varify($target, \@macros),"\n";

##
$target = 'gcc -c -g -O2 -Wall -DNDEBUG';
@macros = (
'$(CC)' , 'gcc',
'$(CFLAGS)', '-g -O2 -Wall',
'$(DFLAGS)', '-DNDEBUG',
);
print "\n",$target,"\n";
for (my $ndx = 0; $ndx <= $#macros; $ndx+=2) {
print +($macros[$ndx] =~ /\$\((.*?)\)/), " = $macros[$ndx+1]\n";
}
print varify($target, \@macros),"\n";

##
sub varify
{
my ($matched, $newtarget, $macref) = (1, @_);
while ($matched) {
$matched = 0;
for my $ndx (0 .. $#{$macref}/2) {
$ndx *= 2;
my ($varname, $value) = @$macref[$ndx, $ndx+1];
$newtarget =~ s/ \$\(.*?\)\K | (\Q$value\E) /
if (defined $1) {
$matched = 1;
$varname
}
else {''}
/xeg;
}
}
return $newtarget;
}
__END__

-sln
 
A

Adam Kellas

Just a comment that no where is it written that you can reconstruct
variables from the output of  variable substitution.

Yes, this is analogous to decompilation; it's not possible to reliably
put it back the way it was, but it may be possible to generate
something which is both correct and fairly readable. There will always
be macroization opportunities missed and bad choices. For a simple
instance:

RM = rm
RM_FLAGS = -f
MV = mv
MV_FLAGS = -f

Given this, varify("rm -f foo") might turn into "$(RM) $(MV_FLAGS)
foo" and I don't think there's any way to prevent it without getting
into some serious AI stuff. But in this application that's ok as long
as the result works. The user can be expected to do a little manual
cleanup in an editor if need be.

Thanks for the sample code.

AK
 
P

Peter J. Holzer

OK, if you insist, we do have a disagreement. I will simply note that
the benchmark module is of little use when the code you're fixing is
so slow as to never finish.

If the code you're fixing is so slow as to never finish, blindly
replacing named variables with $_ is even less use. At best that brings
you a small constant speedup. 95% of infinity is still infinity.

If your code is so slow that you can't even measure it you first should
make sure that it isn't stuck in an infinite loop. Then prepare a test
case where it does finish in a practicable time (maybe run with a
reduced data set, or simply exit after a fixed number of iterations),
and look where it spends the time (Devel::NYTProf is a good profiler but
be warned that it can add a lot of overhead (I've had code which ran 5
to 10 times slower with profiling than without[1])). Then try to make
that code faster - and that will almost always mean a change to the
algorithm, not a micro-optimization.

hp

[1] All the profilers for perl I know hook into the debugger, so they
execute extra code for each line/block/sub that is executed. Using
a SIGPROF handler (like the traditional Unix profiler does) should
have much less overhead - does anyone know a perl profiler which
does this?
 
R

RedGrittyBrick

On Wed, 3 Mar 2010 06:29:38 -0800 (PST) in comp.lang.perl.misc, Adam


% is a poor choice because it is dense, busy, and wide. It puts more
black on the display, so that is about the same density of many other
characters. It has more squiggles than / making it indeed harder to
read. / is lighter that average, and narrow so it leaves minute white
space between it and adjacent characters, for better visual separation.

/ loses these advantages when adjacent characters resemble it, as in the
infamous leaning toothpicks syndrome s/\/\//\\\\/; etc. but there are
other solutions for that.

And, of course, % is contrary to convention.

perldoc perlop gives as an example $path =~ s|/usr/bin|/usr/local/bin|;
and Conway's recommendation of {} seems a useful one to follow.

I'm sure I've read a recommendation for "tall thin characters" as
delimiters but can't find the source now.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,740
Latest member
JudsonFrie

Latest Threads

Top