fastest count of instances in string?

Roy Johnson · Oct 5, 2003

John W. Krahn said:
If the value in $stupid is already a string why would you need to quote
the variable?

Exactly. And if tr/stupid// doesn't change any values, why would you
need to specify the values not to change?

Roy Johnson · Oct 5, 2003

Uri Guttman said:
then there is another
bug. if $tr_chars has more than 1 char then s/// not delete the
individual chars. he needs a char class for that.

That's not a bug, it's a spec. The question was about counting how
many times an individual character appeared in a string.

I think there was a goof in the mtr sub, though: I left "a" instead of
"$tr_chars".

Iain Truskett · Oct 5, 2003

That's not a bug, it's a spec. The question was about counting how
many times an individual character appeared in a string.

So why is it called $tr_chars instead of the singular $tr_char? Or even
just $char?

cheers,

John W. Krahn · Oct 5, 2003

Roy said:
Exactly. And if tr/stupid// doesn't change any values, why would you
need to specify the values not to change?

Ahh, but it does change values. It just changes them to the same thing
that they were before.

John

Roy Johnson · Oct 6, 2003

John W. Krahn said:
Ahh, but it does change values. It just changes them to the same thing
that they were before.

You have an unusual notion of the definition of "change".
Nevertheless, the behavior is exactly the same, regardless of whether
you specify the replacement list, so why specify it?

Anno Siegel · Oct 7, 2003

Eric J. Roode said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Probably the fastest way would be to write a C function to do the counting,
then link to it via XS (or Inline::C).

I don't think a C function can be much faster than a pre-compiled tr///.
In fact, I was courious enough to do a quick benchmark. (Except they're
never really quick.) The Inline code *was* faster -- by 4 %.

Anno

Roy Johnson · Oct 7, 2003

Better benchmark code produces more satisfying benchmarks. Note the
addition of a method using "index", which is not too shabby.

I noticed that adding \Q to the patterns (so that I matched strings
rather than re's) slowed them down, so I cached the pattern matches.
They had been running about 4x as slow as cached tr///, but reduced
the difference to less than a factor of 2.

Results:
scc : uses s///g: 8.5 sec
mcc : uses m//g with scalar map to count results: 10.8 sec
mcc2: uses m//g with an explicit loop and counter: 11.9 sec
trcc: cached subs of eval tr///: 6.4 sec
tacc: same, but using arrays instead of hashes to cache: 6.0 sec
incc: uses index and explicit counter: 7.9 sec

Notes:
The spec is to count occurrences of single characters. Most of these
methods generalize:
tacc will only do individual characters
trcc will do character classes
the others will do substrings (though the patterns could be
rewritten to match classes instead)
Using index is surprisingly fast, and requires no caching
scc returns empty string instead of zero

Code:
#!perl
use strict;
use warnings;

my $str = 'abccdeageabbcab';
my @chars = qw(a b c d e f g);
my $count;
my $sub = \&mcc;
my %cache = ();
my @cache = ();
for (1..100000) {
$count = &$sub($_, $str) for (@chars);
}
for (@chars) {
$count = &$sub($_, $str);
print "There are $count ${_}s in $str\n";
}

sub mcc {
my ($c, $s) = @_;
$cache{$c} ||= qr/\Q$c/;
scalar map(/$cache{$c}/g, $s);
}

sub mcc2 {
my ($c, $s) = @_;
my $count = 0;
$cache{$c} ||= qr/\Q$c/;
++$count while $s=~/$cache{$c}/g;
$count;
}

sub trcc {
my ($c, $s) = @_;
$cache{$c} ||= eval "sub { \$_[0] =~ tr/$c// }";
$cache{$c}->($s);
}

sub tacc {
my ($c, $s) = @_;
$cache[ord $c] ||= eval "sub { \$_[0] =~ tr/$c// }";
$cache[ord $c]->($s);
}

sub scc {
my ($c, $s) = @_;
$cache{$c} ||= qr/\Q$c/;
scalar $s=~s/$cache{$c}//g;
}

sub incc {
my ($c, $s) = @_;
my $count = 0;
my $pos = index($s, $c);
while ($pos >= 0) {
++$count;
$pos = index($s, $c, $pos+1);
}
$count;
}

Sort and count word pairs in a string	6	Jan 29, 2023
Find and count strings of text from multiple files	17	Dec 16, 2021
Why does my SDL3 C++ app display very low fps but my frame count is very high	0	Mar 3, 2025
Small JS Countdown timer where user has to type string of numbers to stop it and win	8	Jun 16, 2024
How to try a range of hex values in C# code ?	0	Nov 19, 2022
Converting an Array to a String in JavaScript	7	Sep 22, 2023
How to keep count of right answer and wrong answers in C++?	0	Nov 3, 2021
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023

fastest count of instances in string?

Roy Johnson

Roy Johnson

Iain Truskett

John W. Krahn

Roy Johnson

Anno Siegel

Roy Johnson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads