how to speed up a string-substitution loop?

Uri Guttman · Mar 3, 2010

AK> OK, if you insist, we do have a disagreement. I will simply note that
AK> the benchmark module is of little use when the code you're fixing is
AK> so slow as to never finish.

no, it is useful when you think using $_ will save you massive amounts
of time when it doesn't. so that will lead you to better coding style
and to focus on your real slowdown. also profiling modules will help
(even in code that never finishes) to find out where the time is spent.

uri

sln · Mar 3, 2010

Hi,

Looking for suggestions on how to speed up the function below. It's
intended to "re-macroize" the output of make; in other words given a
sequence of command lines generated by make, and a set of make macros,
I need to substitute in the make variables such that "gcc -c -g -O2 -
Wall -DNDEBUG" might become (say) "$(CC) -c $(CFLAGS) $(DFLAGS)", as
much as possible as it was in the original Makefile. The %Variables
hash maps strings to macro names; thus with the above example we would
have

$Variables{'gcc'} = '$(CC)';
$Variables{'-g -O2 -Wall'} = '$(CFLAGS)';
$Variables{'-DNDEBUG'} = '$(DFLAGS)';

Anyway, the function below seems to work but scales very badly and
becomes unusable when enough variables are in use. Any ideas on how to
write it better?

sub varify {
my $word = shift;
$word =~ s%\$%\$\$%g;
for my $substr (keys %Variables) {
while ((my $start = index($word, $substr)) >= 0) {
substr($word, $start, length($substr)) = $Variables{$substr};
}
}
return $word;
}

Thanks,
AK

Just a comment that no where is it written that you can reconstruct
variables from the output of variable substitution.

$(a) = a
$(b) = -b$(a)

cc a.obj $(b) ab.obj ->
cc a.obj -ba ab.obj
----
cc a.obj -ba ab.obj ->
cc $(a).obj -b$(a) $(a)b.obj ->
cc $(a).obj $(b) $(a)b.obj != cc a.obj $(b) ab.obj

Even if you could be guaranteed distinction in the final
output, the order in which you do the reverse substitution
has to start from the first variable defined and progress
to the last defined, ie: FIFO.

This means you can't use a hash, which is random and can't
be fifo. Instead you have to store the pseudo variables and
thier data in an array:

@Variables = (
'$(a)' , 'a',
'$(b)' , '-b$(a)',
);

then read each pair as you progress down the list.

Then, to be complete, you have to repeat the substitution
reconstruction process as many times as the deepest nesting
of the $Variables. This could be acomplished by repeating
until there is no difference between the old and new strings.

But, if you can overcome these hurdles, you might get a
broken up static snapshot of makefile state, which can
dynamically generate multiple states.

In this case it would be:

$(a) = a
$(b) = -b$(a)
cc $(a).obj $(b) $(a)b.obj
(but this was designed to fail)

-sln

sln · Mar 3, 2010

Just a comment that no where is it written that you can reconstruct
variables from the output of variable substitution.

$(a) = a
$(b) = -b$(a)

When checking if the value is in the target,
there must be a check that the value itself
is not a variable name.

'this is the t$(a)rget '
^
Found value 'a' here.
Aviod substitution if its part of a variable name,
enclosed with $().

$target =~ s/ \$$.*?$ \K | ($value) / defined $1 ? $varname : '' /xeg;

Probably in the simplese cases, this will never happen, but its possible.

-sln

sln · Mar 4, 2010

When checking if the value is in the target,
there must be a check that the value itself
is not a variable name.

'this is the t$(a)rget '
^
Found value 'a' here.
Aviod substitution if its part of a variable name,
enclosed with $().

$target =~ s/ \$$.*?$ \K | ($value) / defined $1 ? $varname : '' /xeg;

This could be done using regexp, something like this (untested):

use strict;
use warnings;

my ($target,@macros);

##
$target = 'cc a.obj -ba ab.obj';
@macros = (
'$(a)' , 'a',
'$(b)' , '-b$(a)',
'$(c)' , '-c$(a)',
);
print "\n",$target,"\n";
for (my $ndx = 0; $ndx <= $#macros; $ndx+=2) {
print +($macros[$ndx] =~ /\$$(.*?)$/), " = $macros[$ndx+1]\n";
}
print varify($target, \@macros),"\n";

##
$target = 'gcc -c -g -O2 -Wall -DNDEBUG';
@macros = (
'$(CC)' , 'gcc',
'$(CFLAGS)', '-g -O2 -Wall',
'$(DFLAGS)', '-DNDEBUG',
);
print "\n",$target,"\n";
for (my $ndx = 0; $ndx <= $#macros; $ndx+=2) {
print +($macros[$ndx] =~ /\$$(.*?)$/), " = $macros[$ndx+1]\n";
}
print varify($target, \@macros),"\n";

##
sub varify
{
my ($matched, $newtarget, $macref) = (1, @_);
while ($matched) {
$matched = 0;
for my $ndx (0 .. $#{$macref}/2) {
$ndx *= 2;
my ($varname, $value) = @$macref[$ndx, $ndx+1];
$newtarget =~ s/ \$$.*?$\K | (\Q$value\E) /
if (defined $1) {
$matched = 1;
$varname
}
else {''}
/xeg;
}
}
return $newtarget;
}
__END__

-sln

Adam Kellas · Mar 4, 2010

Just a comment that no where is it written that you can reconstruct
variables from the output of variable substitution.

Yes, this is analogous to decompilation; it's not possible to reliably
put it back the way it was, but it may be possible to generate
something which is both correct and fairly readable. There will always
be macroization opportunities missed and bad choices. For a simple
instance:

RM = rm
RM_FLAGS = -f
MV = mv
MV_FLAGS = -f

Given this, varify("rm -f foo") might turn into "$(RM) $(MV_FLAGS)
foo" and I don't think there's any way to prevent it without getting
into some serious AI stuff. But in this application that's ok as long
as the result works. The user can be expected to do a little manual
cleanup in an editor if need be.

Thanks for the sample code.

AK

Peter J. Holzer · Mar 4, 2010

OK, if you insist, we do have a disagreement. I will simply note that
the benchmark module is of little use when the code you're fixing is
so slow as to never finish.

If the code you're fixing is so slow as to never finish, blindly
replacing named variables with $_ is even less use. At best that brings
you a small constant speedup. 95% of infinity is still infinity.

If your code is so slow that you can't even measure it you first should
make sure that it isn't stuck in an infinite loop. Then prepare a test
case where it does finish in a practicable time (maybe run with a
reduced data set, or simply exit after a fixed number of iterations),
and look where it spends the time (Devel::NYTProf is a good profiler but
be warned that it can add a lot of overhead (I've had code which ran 5
to 10 times slower with profiling than without[1])). Then try to make
that code faster - and that will almost always mean a change to the
algorithm, not a micro-optimization.

hp

[1] All the profilers for perl I know hook into the debugger, so they
execute extra code for each line/block/sub that is executed. Using
a SIGPROF handler (like the traditional Unix profiler does) should
have much less overhead - does anyone know a perl profiler which
does this?

RedGrittyBrick · Mar 5, 2010

On Wed, 3 Mar 2010 06:29:38 -0800 (PST) in comp.lang.perl.misc, Adam

% is a poor choice because it is dense, busy, and wide. It puts more
black on the display, so that is about the same density of many other
characters. It has more squiggles than / making it indeed harder to
read. / is lighter that average, and narrow so it leaves minute white
space between it and adjacent characters, for better visual separation.

/ loses these advantages when adjacent characters resemble it, as in the
infamous leaning toothpicks syndrome s/\/\//\\\\/; etc. but there are
other solutions for that.

And, of course, % is contrary to convention.

perldoc perlop gives as an example $path =~ s|/usr/bin|/usr/local/bin|;
and Conway's recommendation of {} seems a useful one to follow.

I'm sure I've read a recommendation for "tall thin characters" as
delimiters but can't find the source now.

Speed up Javascript Loop ??	9	Aug 14, 2010
How to speed up XML reading	11	Sep 11, 2012
Performance of list.index - how to speed up a silly algorithm?	1	Apr 29, 2010
No rule to make target `Parser/printgrammar.o'	0	Nov 24, 2010
10 Easy Steps to Speed Up Your Computer - Without Upgrading	0	Jun 8, 2009
(silly?) speed comparisons	1	Jul 8, 2008
porting Ruby to an embedded target	4	May 27, 2011
How can I speed this function up?	15	Nov 18, 2006

how to speed up a string-substitution loop?

Uri Guttman

sln

sln

sln

Adam Kellas

Peter J. Holzer

RedGrittyBrick

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads