J
Jochen Lehmeier
Hello,
Platform:
osname=linux, osvers=2.6.22-3-k7, archname=i486-linux-gnu-thread-multi
uname='linux k 2.6.22-3-k7 #1 smp mon oct 22 22:51:54 utc 2007 i686
gnulinux
use strict;
use warnings;
my $a = "a".("x" x 1000);
my $b = "\x{1234}".("x" x 1000);
for (0..1000)
{
$a =~ s/r/xxx/;
$a =~ s/r/xxx/i;
$b =~ s/r/xxx/;
$b =~ s/r/xxx/i;
}
^L ================ SmallProf version 2.02 ================
Profile of test.pl
Page 94
=================================================================
count wall tm cpu time line
0 0.00000 0.00000 1:#!/usr/local/bin/perl
0 0.00000 0.00000 2:
0 0.00000 0.00000 3:use strict;
0 0.00000 0.00000 4:use warnings;
0 0.00000 0.00000 5:
1 0.00005 0.00000 6:my $a = "a".("x" x 1000);
1 0.00006 0.00000 7:my $b = "\x{1234}".("x" x 1000);
0 0.00000 0.00000 8:
1 0.00000 0.00000 9:for (0..1000)
0 0.00000 0.00000 10:{
1001 0.00596 0.07000 11: $a =~ s/r/xxx/;
1001 0.01276 0.03000 12: $a =~ s/r/xxx/i;
1001 0.04787 0.14000 13: $b =~ s/r/xxx/;
1004 2.05547 2.10000 14: $b =~ s/r/xxx/i;
0 0.00000 0.00000 15:}
I can live with line 13, but line 14 is not funny anymore. 344 times
slower than a latin1 regexp... or 161 times slower than a
latin1-case-insentitive one.
I understand that case calculations are much more complex in utf8 than
latin1. Is there anything that can be done, anyway?
Summary of my perl5 (revision 5 version 8 subversion 8) configuration:perl -V|head
Platform:
osname=linux, osvers=2.6.22-3-k7, archname=i486-linux-gnu-thread-multi
uname='linux k 2.6.22-3-k7 #1 smp mon oct 22 22:51:54 utc 2007 i686
gnulinux
#!/usr/local/bin/perlcat test.pl
use strict;
use warnings;
my $a = "a".("x" x 1000);
my $b = "\x{1234}".("x" x 1000);
for (0..1000)
{
$a =~ s/r/xxx/;
$a =~ s/r/xxx/i;
$b =~ s/r/xxx/;
$b =~ s/r/xxx/i;
}
perl -d:SmallProf test.pl
^L ================ SmallProf version 2.02 ================
Profile of test.pl
Page 94
=================================================================
count wall tm cpu time line
0 0.00000 0.00000 1:#!/usr/local/bin/perl
0 0.00000 0.00000 2:
0 0.00000 0.00000 3:use strict;
0 0.00000 0.00000 4:use warnings;
0 0.00000 0.00000 5:
1 0.00005 0.00000 6:my $a = "a".("x" x 1000);
1 0.00006 0.00000 7:my $b = "\x{1234}".("x" x 1000);
0 0.00000 0.00000 8:
1 0.00000 0.00000 9:for (0..1000)
0 0.00000 0.00000 10:{
1001 0.00596 0.07000 11: $a =~ s/r/xxx/;
1001 0.01276 0.03000 12: $a =~ s/r/xxx/i;
1001 0.04787 0.14000 13: $b =~ s/r/xxx/;
1004 2.05547 2.10000 14: $b =~ s/r/xxx/i;
0 0.00000 0.00000 15:}
I can live with line 13, but line 14 is not funny anymore. 344 times
slower than a latin1 regexp... or 161 times slower than a
latin1-case-insentitive one.
I understand that case calculations are much more complex in utf8 than
latin1. Is there anything that can be done, anyway?