regexp or array?

E

End of Road

This is a question about efficiency. Say I have to check a string against
a series of strings: to check, that is, if the string has a match or not.
I have two ways of doing this.
Using regular expressions

$strs = 'str1|str2|str3";
if ($str =~ /strs/)
...

Or using an array

@strs = qw/str1,str2, str3/;
foreach (@strs) {
if ($_[0] =~ /$_/)
...

these are equivalent methods; but which one is faster?
 
T

Tad J McClellan

End of Road said:
$strs = 'str1|str2|str3";
if ($str =~ /strs/)
...

Or using an array

@strs = qw/str1,str2, str3/;
foreach (@strs) {
if ($_[0] =~ /$_/)
...

these are equivalent methods


No they're not.
 
P

Peter Makholm

End of Road said:
Using regular expressions

$strs = 'str1|str2|str3";
if ($str =~ /strs/)
...

Or using an array

@strs = qw/str1,str2, str3/;
foreach (@strs) {
if ($_[0] =~ /$_/)
...

Better use an array of precompiled regexpes:

my @regexp = map { qr/$_/ } qw(foo bar baz);

if( grep { $_[0] =~ $_ } @regexp ) {

...

}

or even better

use List::MoreUtils qw(any);

my @regexp = map { qr/$_/ } qw(foo bar baz);
if ( any { $_[0] =~ $_ } @regexp ) {
...
}

these are equivalent methods; but which one is faster?

The Benchmark-module can tell you that. Remember to use a realistic
input.

//Makholm
 
M

Marc Lucksch

End said:
This is a question about efficiency. Say I have to check a string against
a series of strings: to check, that is, if the string has a match or not.
I have two ways of doing this.
Using regular expressions

$strs = 'str1|str2|str3";
if ($str =~ /strs/)
...

Or using an array

@strs = qw/str1,str2, str3/;
foreach (@strs) {
if ($_[0] =~ /$_/)
...
More like
@strs = qw/str1 str2 str3/;
foreach (@strs) {
if ($str eq /$_/)
...

The best would be in my opinion:

my %lookup=(
str1=>1,
str2=>1,
str3=1,
#.....
);

if ($lookup{$str}) {
....
}

Marc "Maluku" Lucksch
 
E

End of Road

End of Road said:
$strs = 'str1|str2|str3";
if ($str =~ /strs/)
...

Or using an array

@strs = qw/str1,str2, str3/;
foreach (@strs) {
if ($_[0] =~ /$_/)
...

these are equivalent methods


No they're not.

what I meant by equivalent is that both methods are different ways of
achieving the same thing, am I wrong?
 
J

J. Gleixner

Marc said:
End said:
This is a question about efficiency. Say I have to check a string
against a series of strings: to check, that is, if the string has a
match or not.
I have two ways of doing this.
Using regular expressions

$strs = 'str1|str2|str3";
if ($str =~ /strs/)
...

Or using an array

@strs = qw/str1,str2, str3/;
foreach (@strs) {
if ($_[0] =~ /$_/)
...
More like
@strs = qw/str1 str2 str3/;
foreach (@strs) {
if ($str eq /$_/)
^^^^^^^
Hu?
 
S

sln

This is a question about efficiency. Say I have to check a string against
a series of strings: to check, that is, if the string has a match or not.
I have two ways of doing this.
Using regular expressions

$strs = 'str1|str2|str3";
if ($str =~ /strs/)
...

Or using an array

@strs = qw/str1,str2, str3/;
foreach (@strs) {
if ($_[0] =~ /$_/)
...

these are equivalent methods; but which one is faster?

SIMPLE case:
Linear search begin to end, no quantifiers or modifiers.
And assumes you are breaking out of the foreach loop when you find the first
occurance.

For-> $strs = 'str1|str2|str3";
Context switching - (Disadvantage) In essence, on each position in $str, the context is switched in the OR,
adding overhead. So if you had a 20 char $str, you would have 20 x 3 = 60 switches max.
Comparisons - (20 char $str)
str1|str2|str3, Start = 1-3
str1|str2|str3, Middle = 27-30
str1|str2|str3, End = 57-60

For-> looping $_[0] =~ /$_/;
Context switching - (Advantage) Done a max of 3 times, there is no OR.
Comparisons - (20 char $str)
str1, Start = 1 0 Neutral
str1, Middle = 10 -20 |
str1, End = 20 -40 Advantage

str2, Start = 20+1 = 21 +20 Disadvantage
str2, Middle = 20+10 = 30 0 Neutral
str2, End = 20+20 = 40 -20 Advantage

str3, Start = 40+1 = 41 +40 Disadvantage
str3, Middle = 40+10 = 50 +20 |
str3, End = 40+20 = 60 0 Neutral
----
Average: 0

For the looping comparisons, you can see that for a single match,
End's have an advantage
Start's have a disadvantage

For the OR conditional comparisons the reverse is true,
Start's have an advantage
End's have a a disadvantage

Thats why the conventional logic when using OR's in regex's suggests that
you put the item that will match the most in the Start 'or' position,
and the least in the End 'or' position.

If that is the case, the OR will be faster, but if random data exists in
a vacuum is to be tested, neither the OR nor the Loop will come out the winner.

However, there is the context switching overhead with OR's. This is mitigated
when complex expressions are used where to break out into simple expressions
involving tracking of operations between expressions would be prohibitively
time/memory consuming.

-sln
 
E

Eric Pozharski

End of Road said:
Using regular expressions

$strs = 'str1|str2|str3";
if ($str =~ /strs/)
...

Or using an array

@strs = qw/str1,str2, str3/;
foreach (@strs) {
if ($_[0] =~ /$_/)
...

Better use an array of precompiled regexpes:

my @regexp = map { qr/$_/ } qw(foo bar baz);

if( grep { $_[0] =~ $_ } @regexp ) {

...

}

perl -wle '
use Benchmark qw|cmpthese timethese|;
my @arg = qw{ aa cc ee ff };
my $x = qr{a|b|c};
my $y = q{a|b|c};
cmpthese timethese -10, {
pure => sub { m{a|b|c} foreach @arg },
prec => sub { m{$x} foreach @arg },
expl => sub { $_ =~ $x foreach @arg },
alt => sub { $x foreach @arg },
eval => sub { m{$y} foreach @arg },
};
'
Useless use of private variable in void context at -e line 10.
Benchmark:
running
alt, eval, expl, prec, pure
for at least 10 CPU seconds
....

alt: 12 wallclock secs (10.38 usr + 0.02 sys = 10.40 CPU) @ 267620.77/s (n=2783256)

eval: 11 wallclock secs (10.10 usr + 0.04 sys = 10.14 CPU) @ 57220.81/s (n=580219)

expl: 11 wallclock secs (10.52 usr + 0.04 sys = 10.56 CPU) @ 37743.47/s (n=398571)

prec: 11 wallclock secs (10.56 usr + 0.02 sys = 10.58 CPU) @ 38041.49/s (n=402479)

pure: 11 wallclock secs (10.36 usr + 0.02 sys = 10.38 CPU) @ 61972.35/s (n=643273)

Rate expl prec eval pure alt
expl 37743/s -- -1% -34% -39% -86%
prec 38041/s 1% -- -34% -39% -86%
eval 57221/s 52% 50% -- -8% -79%
pure 61972/s 64% 63% 8% -- -77%
alt 267621/s 609% 603% 368% 332% --

I've checked 4 times, C<expl> and C<prec> flip-flops, while
C<eval> and C<pure> are stable. What I've missed this time?

C<alt> seems to be a *way* fast, right? Left as an excersise for curious
readers. (spoiler, since rot13 is letter only, 100 empty lines below)





































































































perl -wle '
$x = qr{(?{ print 1 })};
@y = qr{ a b c };
$x foreach (@y)'
Useless use of a variable in void context at -e line 4.

*CUT*
 
S

sln

End of Road said:
Using regular expressions

$strs = 'str1|str2|str3";
if ($str =~ /strs/)
...

Or using an array

@strs = qw/str1,str2, str3/;
foreach (@strs) {
if ($_[0] =~ /$_/)
...
[snip]

perl -wle '
use Benchmark qw|cmpthese timethese|;
my @arg = qw{ aa cc ee ff };
my $x = qr{a|b|c};
my $y = q{a|b|c};
cmpthese timethese -10, {
pure => sub { m{a|b|c} foreach @arg },
prec => sub { m{$x} foreach @arg },
expl => sub { $_ =~ $x foreach @arg },
alt => sub { $x foreach @arg },
^^
Doesen't do anything, there is no operator to tell the compiler
what to do with it. Its exactly the same as foreach (@arg) {};
eval => sub { m{$y} foreach @arg },
};
'
Useless use of private variable in void context at -e line 10.
Benchmark:
running
alt, eval, expl, prec, pure
for at least 10 CPU seconds [snip]
pure 61972/s 64% 63% 8% -- -77%
alt 267621/s 609% 603% 368% 332% --

I've checked 4 times, C<expl> and C<prec> flip-flops, while
C<eval> and C<pure> are stable. What I've missed this time?

What does this benchmark? Its not a contrast between a single regex
vs an array of regex. The fact is that qr// is just a suggestion to the
compiler. (?-xism:a|b|c) is just a longer string to be interpred every time
the sub gets called.
C<alt> seems to be a *way* fast, right? Left as an excersise for curious
readers. (spoiler, since rot13 is letter only, 100 empty lines below)
[snip]

perl -wle '
$x = qr{(?{ print 1 })};
@y = qr{ a b c };
$x foreach (@y)'
Useless use of a variable in void context at -e line 4.
Still doesen't do anything, there is no operator for $x.

alt3 => sub { $x foreach (@arg); },
alt4 => sub { foreach (@arg){}; },
alt5 => sub { $_ =~ $x foreach (@arg); },
alt6 => sub { /$x/ foreach (@arg); },

Rate alt5 alt6 alt3 alt4
alt5 272315/s -- -2% -64% -65%
alt6 278751/s 2% -- -63% -64%
alt3 760285/s 179% 173% -- -2%
alt4 779012/s 186% 179% 2% --

-sln
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,983
Messages
2,570,187
Members
46,747
Latest member
jojoBizaroo

Latest Threads

Top