Ben said:
Les Peters said:
I am trying to properly order a series of regexen so that a general pattern
will not match all the lines of a more specific pattern, as part of a log
monitoring script.
This problem is not well-defined: which is the more specific of these
/[abc]/ and /[cd]/
Those two patterns would have equal specificity.
The problem lies with patterns like:
/abc/ and /abcde/
if /abc/ is reached first, it will match the lines that /abcde/ would match,
therefore, /abcde/ should be used first, then /abc/.
Here is an update to the routine (I will be collapsing some of
this after it works for a significantly complex problem set):
sub pattern_check {
#my ($p1, $p2) = @_;
my ($p1) = @_;
# transform p1 into s1 that matches p1
my $s1;
$s1 = $p1;
$s1 =~ s/^\^//; # replace ^
if ($s1 =~ /\\d{(\d+)}/) {
my $patch = "0" x $1;
$s1 =~ s/\\d{(\d+)}/$patch/g;
}
$s1 =~ s/\\d[\+\*]?/0/g; # replace \d, \d+, \d*
$s1 =~ s/\\D[\+\*]?/-/g; # replace \D, \D+, \D*
$s1 =~ s/\\s[\+\*]?/ /g; # replace \s, \s+, \s*
$s1 =~ s/\\S[\+\*]?/-/g; # replace \S, \S+, \S*
$s1 =~ s/\\w[\+\*]?/a/g; # replace \w, \w+, \w*
$s1 =~ s/\\W[\+\*]?/-/g; # replace \W, \W+, \W*
$s1 =~ s/([^\\])\+/\1/g; # replace <non-backslash>+
$s1 =~ s/([^\\])\*//g; # replace <non-backslash>+
$s1 =~ s/\\\+/+/g; # replace \+
$s1 =~ s/\\\$/\$/g; # replace \$
$s1 =~ s/\\\*/*/g; # replace \*
$s1 =~ s/\\\././g; # replace \.
$s1 =~ s/\\</</g; # replace \<
$s1 =~ s/\\>/>/g; # replace \>
$s1 =~ s/ +[\+\*]/ /g; # replace <space>+, <space>*
while ($s1 =~ /([^\\])\[(.)[^]]*?\][\+\*]?/) {
$s1 =~ s/([^\\])\[(.)[^]]*?\][\+\*]?/\1\2/; # replace character range
}
if ($s1 =~ /[^\\]\|/) {
($begin, $end) = split(/\|/,$s1);
@chars = split(//,$begin);
$sparen_count = scalar grep(/\(/,@chars);
$eparen_count = scalar grep(/\)/,@chars);
if ((($sparen_count - $eparen_count) % 2) == 0) {
$s1 = $begin;
}
}
while ($s1 =~ /\((.+?)\|.+?\)/) {
$s1 =~ s/\((.+?)\|.+?\)/\1/g; # replace alternation
}
while ($s1 =~ /(.+?)\|.+?/) {
$s1 =~ s/(.+?)\|.+?/\1/g; # replace alternation
}
$s1 =~ s/([^\\])[()]/\1/g;
$s1 =~ s/\\([\-\[\]\(\)\/\.'"?@#{}])/\1/g; # replace backslashed -[]()/.'"?@#{}
$s1 =~ s/\$$//; # replace $
if ($s1 !~ /$p1/) {
print "I /$p1/\n";
print "O '$s1'\n";
print "NO\n";
print "\n";
}
# attempt to match s1 with p2
# if successful, conflict
}
At the moment, the code is tripping over this pattern:
/login\[[\d]+\]: failed: ^C on /dev/ttyd\d|login\[[\d]+\]: failed: on /dev/ttyd\d|login\[[\d]+\]: Locked ^C account|login\[[\d]+\]:
Locked account/
Specifically, the first caret is giving it fits.
Les