Better benchmark code produces more satisfying benchmarks. Note the
addition of a method using "index", which is not too shabby.
I noticed that adding \Q to the patterns (so that I matched strings
rather than re's) slowed them down, so I cached the pattern matches.
They had been running about 4x as slow as cached tr///, but reduced
the difference to less than a factor of 2.
Results:
scc : uses s///g: 8.5 sec
mcc : uses m//g with scalar map to count results: 10.8 sec
mcc2: uses m//g with an explicit loop and counter: 11.9 sec
trcc: cached subs of eval tr///: 6.4 sec
tacc: same, but using arrays instead of hashes to cache: 6.0 sec
incc: uses index and explicit counter: 7.9 sec
Notes:
The spec is to count occurrences of single characters. Most of these
methods generalize:
tacc will only do individual characters
trcc will do character classes
the others will do substrings (though the patterns could be
rewritten to match classes instead)
Using index is surprisingly fast, and requires no caching
scc returns empty string instead of zero
Code:
#!perl
use strict;
use warnings;
my $str = 'abccdeageabbcab';
my @chars = qw(a b c d e f g);
my $count;
my $sub = \&mcc;
my %cache = ();
my @cache = ();
for (1..100000) {
$count = &$sub($_, $str) for (@chars);
}
for (@chars) {
$count = &$sub($_, $str);
print "There are $count ${_}s in $str\n";
}
sub mcc {
my ($c, $s) = @_;
$cache{$c} ||= qr/\Q$c/;
scalar map(/$cache{$c}/g, $s);
}
sub mcc2 {
my ($c, $s) = @_;
my $count = 0;
$cache{$c} ||= qr/\Q$c/;
++$count while $s=~/$cache{$c}/g;
$count;
}
sub trcc {
my ($c, $s) = @_;
$cache{$c} ||= eval "sub { \$_[0] =~ tr/$c// }";
$cache{$c}->($s);
}
sub tacc {
my ($c, $s) = @_;
$cache[ord $c] ||= eval "sub { \$_[0] =~ tr/$c// }";
$cache[ord $c]->($s);
}
sub scc {
my ($c, $s) = @_;
$cache{$c} ||= qr/\Q$c/;
scalar $s=~s/$cache{$c}//g;
}
sub incc {
my ($c, $s) = @_;
my $count = 0;
my $pos = index($s, $c);
while ($pos >= 0) {
++$count;
$pos = index($s, $c, $pos+1);
}
$count;
}