regex diffs between perl 5.6.1 and 5.8.0?

P

Patrick Flaherty

Hi,

Back in 5.6.1, the following succeeded in stripping out all x1a garbage chars
from a set of files:

perl -p0777 -i.bu -e 's/\X1a+$//g' house.lis

I run the same thing under 5.8.0 and it has no effect.

Doesn't compile or puke. But doesn't remove the garbage chars either.

From what little I've read, there do appear to be noticable differences between
pre-5.8 and 5.8.+

pat
 
J

Jay Tilton

: Back in 5.6.1, the following succeeded in stripping out all x1a garbage chars
: from a set of files:
:
: perl -p0777 -i.bu -e 's/\X1a+$//g' house.lis
:
: I run the same thing under 5.8.0 and it has no effect.

Case matters. "\X1a" is not the same thing as "\x1a".
"\X" in a regex has its own special meaning.

If that code worked as expected in 5.6.1., it probably shouldn't have.
The difference in behavior between 5.6.1 and 5.8.0 would be because of
a bug fix, though I'm not seeing it right away in the delta docs.
 
P

Patrick Flaherty

: Back in 5.6.1, the following succeeded in stripping out all x1a garbage chars
: from a set of files:
:
: perl -p0777 -i.bu -e 's/\X1a+$//g' house.lis
:
: I run the same thing under 5.8.0 and it has no effect.

Case matters. "\X1a" is not the same thing as "\x1a".
"\X" in a regex has its own special meaning.

If that code worked as expected in 5.6.1., it probably shouldn't have.
The difference in behavior between 5.6.1 and 5.8.0 would be because of
a bug fix, though I'm not seeing it right away in the delta docs.


Thanx Jay,

Actually my original code _is_ a lower-case x. The upper case in the above was
some stuff I was experimenting with. So I don't think this is the problem I'm
having.

pat
 
J

Jay Tilton

: In article <[email protected]>, Jay Tilton says...
: >
: >: Back in 5.6.1, the following succeeded in stripping out all x1a garbage chars
: >: from a set of files:
: >:
: >: perl -p0777 -i.bu -e 's/\X1a+$//g' house.lis
: >:
: >: I run the same thing under 5.8.0 and it has no effect.
: >
: >Case matters. "\X1a" is not the same thing as "\x1a".
: >"\X" in a regex has its own special meaning.
:
: Actually my original code _is_ a lower-case x. The upper case in the above was
: some stuff I was experimenting with. So I don't think this is the problem I'm
: having.

Then I'm stumped. As far as that code goes, there should be no
difference between 5.6.1 and 5.8.0.

The only reason I can see that the code would not strip \x1a
characters from the ends of lines is if the lines have no \x1a at
their ends.

It's time for a more rigorous regression test and a hard look at your
data file.

As a complete WAG, you might investigate binmode(), which became
significant on all platforms with Perl 5.8.0.
 
P

Patrick Flaherty

: In article <[email protected]>, Jay Tilton says...
: >
: >: Back in 5.6.1, the following succeeded in stripping out all x1a garbage
chars
: >: from a set of files:
: >:
: >: perl -p0777 -i.bu -e 's/\X1a+$//g' house.lis
: >:
: >: I run the same thing under 5.8.0 and it has no effect.
: >
: >Case matters. "\X1a" is not the same thing as "\x1a".
: >"\X" in a regex has its own special meaning.
:
: Actually my original code _is_ a lower-case x. The upper case in the above
was
: some stuff I was experimenting with. So I don't think this is the problem I'm
: having.

Then I'm stumped. As far as that code goes, there should be no
difference between 5.6.1 and 5.8.0.

The only reason I can see that the code would not strip \x1a
characters from the ends of lines is if the lines have no \x1a at
their ends.

It's time for a more rigorous regression test and a hard look at your
data file.

As a complete WAG, you might investigate binmode(), which became
significant on all platforms with Perl 5.8.0.

Hi Jay,

Well that's very interesting.

Yes the 1a's are there. This is a file copied from VMS to Windows over
PATHworks (file sharing software spanning VMS and Windows). The 1a's are a (to
us) well-known artifact of differences in the file systems on VMS and Windows.

I check the 1a's by going into Emacs and then going to the bottom of the file. A
whole bunch of ctrl-Z's (that aren't there when you open the file on VMS).
Moreover I can use Emacs (on Windows) and open the file with hexl-find-file and
indeed the ctrl-Z's correspond to 1a's.

MAYBE A FACTOR: the 5.8 (Perl) that I'm trying to use is on Citrix servers
(where various flavors of low-level funkiness can happen for programmers).

Did an experiement. The one-liner still doesn't work on Citrix and with Perl
5.8. However the following in a script _does work_ (!):

local $^I = '.bu';
local @ARGV = glob '*.TXT';
my $prev_filename;
while (<>) {
if ($ARGV ne $prev_filename) {
print "$ARGV\n";
print STDOUT "$ARGV\n";
}
s/\x1a+$//g;
print;
$prev_filename = $ARGV;
}

(This adds printing the filename into the first line of the contents since there
are about 900 of these files that I'm going to then import into iSilo and load
onto my Palm).

Obviously I'll use the script for the time being but it would be interesting to
get to the bottom of why the one-liner (the direct command-line invocation)
doesn't work.

I, unfortunately, can't do Perl installs onto our Citrix servers. However I can
probably ask the systems guys to put varying versions of Perl into some other
location, leaving the environment variables pointing to the main location
untouched).

pat
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,821
Latest member
AleidaSchi

Latest Threads

Top