Expressing AND, OR, and NOT in a Single Pattern

U

usaims

I'm having a little problem with this example in the Perl Cookbook.

True if pattern BAD does not match, but pattern GOOD does:
/(?=(?:(?!BAD).)*$)GOOD/s

My objective is to print only lines that have 'suspended' but not
'Data_services'. It is still printing lines with 'suspended' and
'Data_services' in the same line. So, ideally, this script should
print any lines. Correct me if I am wrong.

##############################
#!/usr/bin/perl
use strict;
use diagnostics;
use warnings;

my @stuff = <DATA>;

foreach my $foo(@stuff) {
if ($foo =~ /(?=(?:(?!Data_services).)*$)suspended/s) {
print $foo;

}
}
close(DATA);

__DATA__
<Query id='Data_services.LSSI_Weekly.42' suspended='1' error='Loading
Data Only - cannot run query' wuid='W20070227-140132'
associatedName='libW20070227-140132.so'/>
<Query id='Data_services.SSNMapKeys.14' suspended='1' error='Loading
Data Only - cannot run query' wuid='W20070105-115230'
associatedName='libW20070105-114650.so'/>
<Query id='Data_services.WatercraftKeys.5' suspended='1'
error='Loading Data Only - cannot run query' wuid='W20070123-114242'
associatedName='libW20070123-114242.so'/>
 
S

Scott Bryce

usaims said:
My objective is to print only lines that have 'suspended' but not
'Data_services'.

I prefer to use index for something like this.
It is still printing lines with 'suspended' and
'Data_services' in the same line. So, ideally, this script should
print any lines. Correct me if I am wrong.

There are no lines in your given data that meet your criteria.

Here's my shot at it...

use strict;
use warnings;

while (<DATA>)
{
next if index ($_, 'Data_services') > -1;
print $_ if index ($_, 'suspended') > -1;
}

__DATA__
<Query id='Data_services.LSSI_Weekly.42' suspended='1' error='Loading
Data Only - cannot run query' wuid='W20070227-140132'
associatedName='libW20070227-140132.so'/>
<Query id='Data_services.SSNMapKeys.14' suspended='1' error='Loading
Data Only - cannot run query' wuid='W20070105-115230'
associatedName='libW20070105-114650.so'/>
<Query id='Data_services.WatercraftKeys.5' suspended='1' error='Loading
Data Only - cannot run query' wuid='W20070123-114242'
associatedName='libW20070123-114242.so'/>
<Query id='Other_services.SSNMapKeys.14' suspended='1' error='Loading
Data Only - cannot run query' wuid='W20070105-115230'
associatedName='libW20070105-114650.so'/>
 
X

xhoster

usaims said:
I'm having a little problem with this example in the Perl Cookbook.

True if pattern BAD does not match, but pattern GOOD does:
/(?=(?:(?!BAD).)*$)GOOD/s

Every character from the start of the match to the end of the string
has to not (be the start of a) match to BAD. However, if BAD occurs before
GOOD, the regex can still match, simply by not initiating the match until
after the B of BAD.

You want to the forced exclusion to start at the beginning of the string
and run to the end:

/^(?=(?:(?!BAD).)*$).*GOOD/;

But I'd just use two different regex.

Xho
 
H

h3xx

I like doing things in one line:

print grep { /suspended/ && ! /Data_services/ } <DATA>;
 
G

gf

I like doing things in one line:

print grep { /suspended/ && ! /Data_services/ } <DATA>;


I prefer this method too. For clarity and long-term maintenance it is
much better because the esoterica of regex can make the desired
results hard to figure out and the bugs in the pattern even harder to
find.

Also, speed wise, this is a lot faster. The regex engine has to do a
lot of work that can be short circuited by the booleans.

Sometimes it's better to break the search for matching patterns into
single lines too. It's kind of macho programmer-wise to string it all
together into one mondo regex pattern and have it work, but the logic
can get fragile.

The only thing I'd do differently to these patterns is add an anchor
to the 'Data_services' pattern, like so...

/^<Query id='Data_services/

Anchors speed up regex an incredible amount. I did benchmarks of index
vs various ways of using regex, and an anchored qr// that was
initialized outside a loop was the fastest at finding patterns inside
long strings, when the pattern was at the end of the string. At the
beginning of a string it should be equal to index(). Index() was
faster when finding a fixed string somewhere in the middle of another
string.
 
B

Brian McCauley

Every character from the start of the match to the end of the string
has to not (be the start of a) match to BAD. However, if BAD occurs before
GOOD, the regex can still match, simply by not initiating the match until
after the B of BAD.

You want to the forced exclusion to start at the beginning of the string
and run to the end:

/^(?=(?:(?!BAD).)*$).*GOOD/;

That's exponentially (er, factorially?) ineficient!

/^(?!.*BAD).*GOOD/;
But I'd just use two different regex.

Yes, of course, that's still the best way.
 
X

xhoster

Brian McCauley said:
That's exponentially (er, factorially?) ineficient!

Under what condistions is it exponential? With the patterns I've tested,
it seems to be linear, not exponential. (But still a quite a lot slower
than yours, for reasons I don't quite understand. It would make more sense
to me if it were exponentially slower, rather than constantly 30 times
slower.)

Xho
 
M

Mirco Wahab

Brian said:
That's exponentially (er, factorially?) ineficient!

/^(?!.*BAD).*GOOD/;


Yes, of course, that's still the best way.

This

/^(?!.*BAD).*GOOD/

is, in my opinion, of "Maxwellian beauty".

I tried some time to get the original
expression somehow simplified, it (I)
ended with 'throwing the gun'.

Thanks,

Mirco
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top