This is a question about efficiency. Say I have to check a string against
a series of strings: to check, that is, if the string has a match or not.
I have two ways of doing this.
Using regular expressions
$strs = 'str1|str2|str3";
if ($str =~ /strs/)
...
Or using an array
@strs = qw/str1,str2, str3/;
foreach (@strs) {
if ($_[0] =~ /$_/)
...
these are equivalent methods; but which one is faster?
SIMPLE case:
Linear search begin to end, no quantifiers or modifiers.
And assumes you are breaking out of the foreach loop when you find the first
occurance.
For-> $strs = 'str1|str2|str3";
Context switching - (Disadvantage) In essence, on each position in $str, the context is switched in the OR,
adding overhead. So if you had a 20 char $str, you would have 20 x 3 = 60 switches max.
Comparisons - (20 char $str)
str1|str2|str3, Start = 1-3
str1|str2|str3, Middle = 27-30
str1|str2|str3, End = 57-60
For-> looping $_[0] =~ /$_/;
Context switching - (Advantage) Done a max of 3 times, there is no OR.
Comparisons - (20 char $str)
str1, Start = 1 0 Neutral
str1, Middle = 10 -20 |
str1, End = 20 -40 Advantage
str2, Start = 20+1 = 21 +20 Disadvantage
str2, Middle = 20+10 = 30 0 Neutral
str2, End = 20+20 = 40 -20 Advantage
str3, Start = 40+1 = 41 +40 Disadvantage
str3, Middle = 40+10 = 50 +20 |
str3, End = 40+20 = 60 0 Neutral
----
Average: 0
For the looping comparisons, you can see that for a single match,
End's have an advantage
Start's have a disadvantage
For the OR conditional comparisons the reverse is true,
Start's have an advantage
End's have a a disadvantage
Thats why the conventional logic when using OR's in regex's suggests that
you put the item that will match the most in the Start 'or' position,
and the least in the End 'or' position.
If that is the case, the OR will be faster, but if random data exists in
a vacuum is to be tested, neither the OR nor the Loop will come out the winner.
However, there is the context switching overhead with OR's. This is mitigated
when complex expressions are used where to break out into simple expressions
involving tracking of operations between expressions would be prohibitively
time/memory consuming.
-sln