regexp that seems not to work since 5.10

  • Thread starter Sébastien Cottalorda
  • Start date
S

Sébastien Cottalorda

Hi all,
I use a regexp to split a network frame protocol like this.

#-------------------------------------------------------------------
#!/usr/bin/perl -w
use strict;
use constant ETX => chr( hex('03'));
use constant ACK => chr( hex('06'));
use constant NACK => chr( hex('15'));
my $endcar = ACK.'|'.NACK.'|'.ETX.'.{1}',
my $line = 'hello World'.ETX.'XHow are you today ?'.ETX.'XWell, not so
bad.'.ETX.'X';
while ($line =~ s/([^$endcar]*$endcar)//){
my $buf = $1;
print $buf."\n";
}
print "$line\n";
exit;
#--------------------------------------------------------------------

With 5.8.X version, I use to have:
hello World
How are you today ?
Well, not so bad.

Now I have :
X
X
X
hello WorldHow are you today ?Well, not so bad.

Could someone help me to solve that problem.

Thanks in advance for any help.
Cheers.
Sebastien
 
S

Sébastien Cottalorda

Sorry,

With 5.8.X version, I use to have:
hello World{ETX}X
How are you today ?{ETX}X
Well, not so bad.{ETX}X
 
C

C.DeRykus

Hi all,
I use a regexp to split a network frame protocol like this.

#-------------------------------------------------------------------
#!/usr/bin/perl -w
use strict;
use constant ETX  => chr( hex('03'));
use constant ACK  => chr( hex('06'));
use constant NACK => chr( hex('15'));
my $endcar = ACK.'|'.NACK.'|'.ETX.'.{1}',
^
^
typo - trailing , instead of ;
my $line = 'hello World'.ETX.'XHow are you today ?'.ETX.'XWell, not so
bad.'.ETX.'X';
while ($line =~ s/([^$endcar]*$endcar)//){
^
^
Did you know that alternation and quantifiers
aren't special in a character class..? The
| and {1} in $endcar aren't doing what you
might think at first glance. See perlrequick
or perlretut.
 
S

Sébastien Cottalorda

I use a regexp to split a network frame protocol like this.
#-------------------------------------------------------------------
#!/usr/bin/perl -w
use strict;
use constant ETX  => chr( hex('03'));
use constant ACK  => chr( hex('06'));
use constant NACK => chr( hex('15'));
my $endcar = ACK.'|'.NACK.'|'.ETX.'.{1}';
my $line = 'hello World'.ETX.'XHow are you today ?'.ETX.'XWell, not so
bad.'.ETX.'X';
while ($line =~ s/([^$endcar]*$endcar)//){

                          ^
                          ^
              Did you know that alternation and quantifiers
              aren't special in a character class..? The
              | and {1} in $endcar aren't doing what you
              might think at first glance. See perlrequick
              or perlretut.
        my $buf = $1;
        print $buf."\n";}
print "$line\n";
exit;
...

I've tried those modifications :
with
my $endcar = ACK.'|'.NACK.'|'.ETX;
my $line = 'hello World'.ETX.'How are you today ?'.ETX.'Well, not so
bad.'.ETX;
while ($line =~ s/([^($endcar)]*($endcar))//){
it works pretty good but I cannot manage to make it works with ACK,
NACK and ETX.'.'


I even tried this:
my $endcar = ACK.'|'.NACK.'|'.ETX.'.';
my $line = 'hello World'.ETX.'XHow are you today ?'.ETX.'XWell, not so
bad.'.ETX.'X';
while ($line =~ s/([[:^cntrl:]]*($endcar))//){
and it works perfectly but it's a particular case : I suppose that
split caracters are controls.

but this regexp didn't work with :
my $endcar = ACK.'|'.NACK.'|'.ETX.'.';
my $line = STX.'hello World'.ETX.'X'.ACK.NACK.STX.'How are you
today ?'.ETX.'X'.ACK.STX.'Well, not so bad.'.ETX.NACK;
Unfortunately I need to make that last sample to work.

If someone as a clue ?
Thanks in advance.
Sebastien
 
D

Dr.Ruud

Sébastien Cottalorda said:
I use a regexp to split a network frame protocol like this.

#-------------------------------------------------------------------
#!/usr/bin/perl -w
use strict;
use constant ETX => chr( hex('03'));

Alternative:

use constant ETX => "\x{03}";

use constant ACK => chr( hex('06'));
use constant NACK => chr( hex('15'));
my $endcar = ACK.'|'.NACK.'|'.ETX.'.{1}',

Why does that line end in a comma?

my ($ETX, $ACK, $NACK) = ("\x{03}", "\x{06}", "\x{15}");

my $endcar = "(?:$ETX|$ACK|$NACK)"; # alternation


Alternative:

my $endcar= "[$ETX$ACK$NACK]"; # charset

while ($line =~ s/([^$endcar]*$endcar)//){

You are messing up character class and alternation there.


With your $endcar, this would work:

while ($line =~ s/(.*?(?:$endcar))//s){
 
D

Dr.Ruud

Ben said:
I suspect what the OP wants here is

my $endcar = "\x3\x6\x15";

while ($line =~ s/([^$endcar]*[$endcar].//) {

That is more or less (count the half captures :) what I assumed,
and I also assumed that he would find out the rest himself.
 
S

sln

You've omitted the trailing '.{1}' (which is equivalent to just '.').

my $endcar = "(?:$ETX|$ACK|$NACK).";
^
Seems reasonable the op meant a single char in the alternation
given his: my $endcar = ACK.'|'.NACK.'|'.ETX.'.{1}',
Otherwise if a group its catenated like:
ACK|NACK|ETX.{1} or
(?:$ACK|$NACK|$ETX.)
where an alternation is ETX plus any character,
which is probably a mistake.
Alternative:

my $endcar= "[$ETX$ACK$NACK]"; # charset

As above.
while ($line =~ s/([^$endcar]*$endcar)//){

You are messing up character class and alternation there.


With your $endcar, this would work:

while ($line =~ s/(.*?(?:$endcar))//s){

That depends. /.*?/ is not always equivalent to a negated end condition,
for instance /.*?>x/ will match all of ">>x" whereas /[^>]*>x/ will only
match the last two characters. I suspect what the OP wants here is

But in this case it makes no sence to add characters after the endchar
since you want all from beginning, up to that character, not starting the
match in the middle of the string. Its a total sub-expression '.*?>', part
of an alternation.

In that case given ">>x":
/^.*?>x//
works, whereas
/^[^>]*>x/
doesen't.
my $endcar = "\x3\x6\x15";

while ($line =~ s/([^$endcar]*[$endcar].//) {
while ($line =~ s/([^$endcar]*[$endcar].)//) {
possibly with a /s modifier, since this is a binary protocol so random
newlines seem likely.

Not if you take out the '.'

-sln
 
C

C.DeRykus

...

I even tried this:
my $endcar = ACK.'|'.NACK.'|'.ETX.'.';
my $line = 'hello World'.ETX.'XHow are you today ?'.ETX.'XWell, not so
bad.'.ETX.'X';
while ($line =~ s/([[:^cntrl:]]*($endcar))//){
and it works perfectly but it's a particular case : I suppose that
split caracters are controls.

but this regexp didn't work with :
my $endcar = ACK.'|'.NACK.'|'.ETX.'.';
my $line = STX.'hello World'.ETX.'X'.ACK.NACK.STX.'How are you
today ?'.ETX.'X'.ACK.STX.'Well, not so bad.'.ETX.NACK;
Unfortunately I need to make that last sample to work.

Here's a closer cut I think since you were negating
the character class:

my $endcar = STX . '|' . ACK . '|' . NACK . '|' . ETX ;
while ($line =~ s/([[:cntrl:]]*($endcar))//){
...
}
print $line;


Case 1:
my $line = 'hello World'.ETX.'XHow are you today
?'.ETX.'XWell, not so
output: hello WorldXHow are you today ?XWell, not so
bad.X

Case 2:
my $line = STX.'hello World'.ETX.'X'.ACK.NACK.STX.'How are you
today ?'.ETX.'X'.ACK.STX.'Well, not so bad.'.ETX.NACK;

output: hello WorldXHow are you
today ?XWell, not so bad.
 
S

Sébastien Cottalorda

Found a solution with the help of Olivier Makinen.

use constant STX => chr( hex('02'));
use constant ETX => chr( hex('03'));
use constant ACK => chr( hex('06'));
use constant NACK => chr( hex('15'));
my $line = STX.'hello World'.ETX.'X'.ACK.NACK.STX.'How are you
today ?'.ETX.'X'.ACK.STX.'Well, not so bad.'.ETX.NACK;

my $noendcar = '[^' . ACK . ETX . NACK . ']';
my $endstring = '(' . ACK . '|' . ETX . '.|' . NACK . ')';
while ($line =~ s/$noendcar*$endstring//) {
print "buf=$&\n";
}
print "lastbuffer = $line\n";

I obtains:
buf={STX}hello World{ETX}X
buf={ACK}
buf={NACK}
buf={STX}How are you today ?{ETX}X
buf={ACK}
buf={STX}Well, not so bad.{ETX}X
buf={NACK}
lastbuffer = .... (empty)

It works perfectly.
Thanks all for your help.
Sebastien
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,999
Messages
2,570,246
Members
46,839
Latest member
MartinaBur

Latest Threads

Top