fastest count of instances in string?

R

Roy Johnson

John W. Krahn said:
If the value in $stupid is already a string why would you need to quote
the variable?

Exactly. And if tr/stupid// doesn't change any values, why would you
need to specify the values not to change?
 
R

Roy Johnson

Uri Guttman said:
then there is another
bug. if $tr_chars has more than 1 char then s/// not delete the
individual chars. he needs a char class for that.

That's not a bug, it's a spec. The question was about counting how
many times an individual character appeared in a string.

I think there was a goof in the mtr sub, though: I left "a" instead of
"$tr_chars".
 
I

Iain Truskett

That's not a bug, it's a spec. The question was about counting how
many times an individual character appeared in a string.

So why is it called $tr_chars instead of the singular $tr_char? Or even
just $char?


cheers,
 
J

John W. Krahn

Roy said:
Exactly. And if tr/stupid// doesn't change any values, why would you
need to specify the values not to change?

Ahh, but it does change values. It just changes them to the same thing
that they were before.


John
 
R

Roy Johnson

John W. Krahn said:
Ahh, but it does change values. It just changes them to the same thing
that they were before.

You have an unusual notion of the definition of "change".
Nevertheless, the behavior is exactly the same, regardless of whether
you specify the replacement list, so why specify it?
 
A

Anno Siegel

Eric J. Roode said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



Probably the fastest way would be to write a C function to do the counting,
then link to it via XS (or Inline::C).

I don't think a C function can be much faster than a pre-compiled tr///.
In fact, I was courious enough to do a quick benchmark. (Except they're
never really quick.) The Inline code *was* faster -- by 4 %.

Anno
 
R

Roy Johnson

Better benchmark code produces more satisfying benchmarks. Note the
addition of a method using "index", which is not too shabby.

I noticed that adding \Q to the patterns (so that I matched strings
rather than re's) slowed them down, so I cached the pattern matches.
They had been running about 4x as slow as cached tr///, but reduced
the difference to less than a factor of 2.

Results:
scc : uses s///g: 8.5 sec
mcc : uses m//g with scalar map to count results: 10.8 sec
mcc2: uses m//g with an explicit loop and counter: 11.9 sec
trcc: cached subs of eval tr///: 6.4 sec
tacc: same, but using arrays instead of hashes to cache: 6.0 sec
incc: uses index and explicit counter: 7.9 sec

Notes:
The spec is to count occurrences of single characters. Most of these
methods generalize:
tacc will only do individual characters
trcc will do character classes
the others will do substrings (though the patterns could be
rewritten to match classes instead)
Using index is surprisingly fast, and requires no caching
scc returns empty string instead of zero

Code:
#!perl
use strict;
use warnings;

my $str = 'abccdeageabbcab';
my @chars = qw(a b c d e f g);
my $count;
my $sub = \&mcc;
my %cache = ();
my @cache = ();
for (1..100000) {
$count = &$sub($_, $str) for (@chars);
}
for (@chars) {
$count = &$sub($_, $str);
print "There are $count ${_}s in $str\n";
}

sub mcc {
my ($c, $s) = @_;
$cache{$c} ||= qr/\Q$c/;
scalar map(/$cache{$c}/g, $s);
}

sub mcc2 {
my ($c, $s) = @_;
my $count = 0;
$cache{$c} ||= qr/\Q$c/;
++$count while $s=~/$cache{$c}/g;
$count;
}

sub trcc {
my ($c, $s) = @_;
$cache{$c} ||= eval "sub { \$_[0] =~ tr/$c// }";
$cache{$c}->($s);
}

sub tacc {
my ($c, $s) = @_;
$cache[ord $c] ||= eval "sub { \$_[0] =~ tr/$c// }";
$cache[ord $c]->($s);
}

sub scc {
my ($c, $s) = @_;
$cache{$c} ||= qr/\Q$c/;
scalar $s=~s/$cache{$c}//g;
}

sub incc {
my ($c, $s) = @_;
my $count = 0;
my $pos = index($s, $c);
while ($pos >= 0) {
++$count;
$pos = index($s, $c, $pos+1);
}
$count;
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,141
Messages
2,570,814
Members
47,360
Latest member
kathdev

Latest Threads

Top