single-byte values

D

Don Stock

here's an interesting problem I had today. I wrote a perl script to
compare two binary files byte-by-byte, ignoring certain offsets that
contain timestamps and such. So I did this:

# error if not 2 args or files don't exist or the args reference
# the same file
...

open IN1,$ARGV[0] or die "can't open ",$ARGV[0],"\n"; binmode IN1;
open IN2,$ARGV[1] or die "can't open ",$ARGV[1],"\n"; binmode IN2;
for(;;$offset++){

# get 1 byte from each file
$len1 = read IN1,$a,1;
$len2 = read IN2,$b,1;

# error if either of the reads had an error
...

last if !$len1;

# next if ignoring this offset
...

# if mismatch
if ($a != $b) {die "*** mismatch\n"}
}
print "match\n";

the files always matched, even though they were different in places
that weren't being ignored. Those in the know will see immediately
why the above didn't work. I was stumped. I tried all sorts of
things until I realized that my nice single-byte values were
string-types to perl, the problem of course being that perl converts a
string to a number by taking the value indicated by any initial ascii
digits. If there aren't any initial ascii digits, the value is 0.
Replacing "!=" with "ne" fixed it, and so did "if (ord $a != ord $b)
....".

I think my c instincts got in my way here. I was thinking of the
single-byte values as characters in the c sense, and thus
interchangeable with integers. In c one can say this:

int x = 'a';
printf ("%d\n",x);

==> output is 97

but in perl the effect is quite different:

$x = 'a';
printf "%d\n",$x;

==> output is 0 because 'a' has no leading digits

in this case it's ord to the rescue:

$x = 'a';
printf "%d\n",ord $x;

==> output is 97

the ord function returns the first byte in the string as a numeric
type. Which gives us this result:

num str
--- ---
$x 0 'a'
ord $x 97 '97'

97 and 'a' have the same bit pattern, but that bit pattern has changed
its type.

to convert all bytes in the string $str to a list of numbers (one byte
per number), use this:

@nums = unpack('C*',$str);

anyway, there are two morals here. The first is that perl doesn't
have character types; perl has string types and numeric types and
that's it. The second is that when read reads something, it's a
string type, not a numeric type (which is obvious when reading 500
bytes, but easy to forget when reading 1).

fwiw

don
 
A

A. Sinan Unur

(e-mail address removed) (Don Stock) wrote in
here's an interesting problem I had today. I wrote a perl script to
compare two binary files byte-by-byte, ignoring certain offsets that
contain timestamps and such. So I did this:
....

# if mismatch
if ($a != $b) {die "*** mismatch\n"}

Had you had

use strict;
use warnings;

at the top of your script, you would have received a warning:

Argument "D" isn't numeric in numeric ne (!=) at C:\Home\compare.pl line
18.

Also, $a and $b have special meanings, see perldoc -f sort.
 
D

Don Stock

Had you had
use strict;
use warnings;

at the top of your script, you would have received a warning:

Argument "D" isn't numeric in numeric ne (!=) at C:\Home\compare.pl line
18.

I tried it and got this:

Can't locate warnings.pm in @INC (@INC contains: c:/Perl/lib
c:/Perl/site/lib .) at compare.pl line 2.
BEGIN failed--compilation aborted at compare.pl line 2.

I don't seem to have that package anywhere under /Perl. I'll try to
rustle it up.

I don't understand the warning. Given that every scalar has both a
numeric value and a string value (conceptually speaking, I prefer
putting it that way rather than saying that perl stores everything as
a string and converts as necessary, which to my mind is really just an
implementation detail), how can a scalar *not* be numeric? Or is it
saying that the string-to-numeric conversion forced the value to 0
because there were no leading digits in the string?

Also, $a and $b have special meanings, see perldoc -f sort.

good point. Calling sort will clobber those vars. I knew that, but
had become unwary. I'll start using $x and $y instead.

thanks!

don

p.s. I've become quite fond (overnight it seems) of thinking of chr
and ord as type-shifters rather than as returning a string or numeric
value respectively. E.g.:

given the statement "$x = 'a';":

num str
--- ---
$x 0 97 (i.e. "a")
ord $x 97 57 55 (i.e. "97")

here, ord shifts 97 from str to num and recomputes str

given the statement "$x = 97;":

num str
--- ---
$x 97 57 55 (i.e. "97")
chr $x 0 97 (i.e. "a")

here, chr shifts 97 from num to str and recomputes num
 
M

Michele Dondi

here's an interesting problem I had today. I wrote a perl script to
compare two binary files byte-by-byte, ignoring certain offsets that
contain timestamps and such. So I did this:

Well, this has very few to do with your post, but you *may* still be
interested: just a few days ago I wanted to test a few CDRWs I have
because they had given me a few errors, so I created a test directory
with 699 1Mb random binary files[*] that I subsequently verified with
the following script:

#!/usr/bin/perl -l

use strict;
use warnings;
use constant LEN => 0x100_000;

die "Usage: $0 dir1 dir2\n" unless @ARGV==2;
s|/$||,-d or
die "`$_': doesn't exist or is not a directory\n"
for @ARGV;

undef $/;
for ('001'..'699') {
warn "Comparing $ARGV[0]/$_ and $ARGV[1]/$_\n";
open my $fh1, '<:raw', "$ARGV[0]/$_" or die $!;
open my $fh2, '<:raw', "$ARGV[1]/$_" or die $!;
my $err=LEN - (my $tmp=<$fh1>^<$fh2>) =~ tr/\0//d;
print "$_: $err errors" if $err;
}
__END__

Please note that I'm not claiming that this -definitely ad hoc- script
is particularly smart or elegant, but indeed it worked, and reliably
too, it seems!

Also, having heard so many times that this is not the most efficient
way to slurp in entire files, I thought that it would have run much
more slowly than it actually did...

Note: it may seem from both scripts pasted here that I'm an
(unnecessary-)interpolation maniac, but in fact I generally try to
avoid them if possible. This is not the case with these
quickly-conjured-up scripts.


[*] Using the following script, FWIW:

#!/usr/bin/perl

use strict;
use warnings;

$/=\0x100_000;
open my $fh, '<:raw', '/dev/urandom' or die $!;

for ('001'..'699') {
open my $out, '>:raw', "test/$_" or die $!;
print $out scalar <$fh>;
}
__END__


Michele
 
C

Chris Mattern

Don said:
I tried it and got this:

Can't locate warnings.pm in @INC (@INC contains: c:/Perl/lib
c:/Perl/site/lib .) at compare.pl line 2.
BEGIN failed--compilation aborted at compare.pl line 2.

I don't seem to have that package anywhere under /Perl. I'll try to
rustle it up.
You probably have an old perl. What does "perl -v" say? In any case,
using "perl -w" will get you most of what "use warnings;" does. If your
perl is too old to have warnings, then you can't add it on. You'll
have to upgrade your perl to get it.
I don't understand the warning. Given that every scalar has both a
numeric value and a string value (conceptually speaking, I prefer
putting it that way rather than saying that perl stores everything as
a string and converts as necessary, which to my mind is really just an
implementation detail), how can a scalar *not* be numeric? Or is it
saying that the string-to-numeric conversion forced the value to 0
because there were no leading digits in the string?
Bingo. This is a *warning*, not an *error*. It's saying, "I'm doing
number stuff with these, but they don't look like numbers, so I'm
doing conversions that likely aren't what you meant..."
--
Christopher Mattern

"Which one you figure tracked us?"
"The ugly one, sir."
"...Could you be more specific?"
 
D

Don Stock

my version of perl is too old and does not contain warnings.pm. I
used -w and got lots of things like this:

Argument "r" isn't numeric in ne at compare.pl line 12.

it is indeed complaining about no leading digits

guess I better go update!

don
 
U

Uri Guttman

DS> my version of perl is too old and does not contain warnings.pm. I
DS> used -w and got lots of things like this:

DS> Argument "r" isn't numeric in ne at compare.pl line 12.

DS> it is indeed complaining about no leading digits

DS> guess I better go update!

updating won't fix the warnings. your code does the wrong thing with
strings and numbers regardless of the perl version. use warnings is just
a better way to enable warnings than -w. but -w is supported in all perl
versions and you should either enable it or get a newer perl and use
warnings.

uri
 
D

Don Stock

updating won't fix the warnings

I know. I just hate having an old version. I had 5.0.something. Now I have 5.6.1.
get a newer perl and use warnings.

I will from now on.

don
 
D

Don Stock

Well, this has very few to do with your post, but you *may* still be
interested: just a few days ago I wanted to test a few CDRWs I have
because they had given me a few errors, so I created a test directory
with 699 1Mb random binary files[*] that I subsequently verified with
the following script:

#!/usr/bin/perl -l

use strict;
use warnings;
use constant LEN => 0x100_000;

die "Usage: $0 dir1 dir2\n" unless @ARGV==2;
s|/$||,-d or
die "`$_': doesn't exist or is not a directory\n"
for @ARGV;

undef $/;
for ('001'..'699') {
warn "Comparing $ARGV[0]/$_ and $ARGV[1]/$_\n";
open my $fh1, '<:raw', "$ARGV[0]/$_" or die $!;
open my $fh2, '<:raw', "$ARGV[1]/$_" or die $!;
my $err=LEN - (my $tmp=<$fh1>^<$fh2>) =~ tr/\0//d;
print "$_: $err errors" if $err;
}
__END__

I must say, the "<$fh1>^<$fh2>" totally threw me for a minute. But
now I'm hip. In fact, I was inspired to write my own slurping compare
that ignores bytes at certain offsets, and here it is:

use strict;
use warnings;

die "usage: need 2 files\n" unless $#ARGV==1;
open my $f1,"$ARGV[0]" or die "can't open ",$ARGV[0]; binmode $f1;
open my $f2,"$ARGV[1]" or die "can't open ",$ARGV[1]; binmode $f2;
undef $/;
my $x=<$f1>;
my $y=<$f2>;
my @ignore=(4,5,6,7,22,23,24,25);
for (@ignore) {substr($x,$_,1)=0}
for (@ignore) {substr($y,$_,1)=0}
if ($x eq $y) {printf "match\n"} else {printf "*** mismatch\n"}

substr insists on putting in an ascii '0' instead of a binary 0. To
get a binary 0 in there (not that it mattered) I had to do this:

for (@ignore) {$x =~ s/^(.{$_})./$1\x00/}

incidentally, here's a post from someone who has found "The absolute
fastest way to open and read a file into a string" using sysread:

http://groups.google.com/groups?q=r...perl.*&[email protected]&rnum=22

I note that File::Compare:compare() does a yes/no diff of two binary
files (using the same tortured strcmp convention that 0 means yes).

don
 
T

Tassilo v. Parseval

Also sprach Don Stock:
Well, this has very few to do with your post, but you *may* still be
interested: just a few days ago I wanted to test a few CDRWs I have
because they had given me a few errors, so I created a test directory
with 699 1Mb random binary files[*] that I subsequently verified with
the following script:

#!/usr/bin/perl -l

use strict;
use warnings;
use constant LEN => 0x100_000;

die "Usage: $0 dir1 dir2\n" unless @ARGV==2;
s|/$||,-d or
die "`$_': doesn't exist or is not a directory\n"
for @ARGV;

undef $/;
for ('001'..'699') {
warn "Comparing $ARGV[0]/$_ and $ARGV[1]/$_\n";
open my $fh1, '<:raw', "$ARGV[0]/$_" or die $!;
open my $fh2, '<:raw', "$ARGV[1]/$_" or die $!;
my $err=LEN - (my $tmp=<$fh1>^<$fh2>) =~ tr/\0//d;
print "$_: $err errors" if $err;
}
__END__

I must say, the "<$fh1>^<$fh2>" totally threw me for a minute. But
now I'm hip. In fact, I was inspired to write my own slurping compare
that ignores bytes at certain offsets, and here it is:

use strict;
use warnings;

die "usage: need 2 files\n" unless $#ARGV==1;
open my $f1,"$ARGV[0]" or die "can't open ",$ARGV[0]; binmode $f1;
open my $f2,"$ARGV[1]" or die "can't open ",$ARGV[1]; binmode $f2;
undef $/;
my $x=<$f1>;
my $y=<$f2>;
my @ignore=(4,5,6,7,22,23,24,25);
for (@ignore) {substr($x,$_,1)=0}
for (@ignore) {substr($y,$_,1)=0}
if ($x eq $y) {printf "match\n"} else {printf "*** mismatch\n"}

substr insists on putting in an ascii '0' instead of a binary 0. To
get a binary 0 in there (not that it mattered) I had to do this:

for (@ignore) {$x =~ s/^(.{$_})./$1\x00/}

Well, as you already noticed yourself, Perl is not C. If you want the
NULL character, then do

substr($x, $_, 1, "\0"), substr($y, $_, 1, "\0") for @ignore;

The trailing substitution after inserting a '0' is a red herring.

Also, I would probably use an appropriate unpack() pattern to handle the
bytes to be ignored. But that's mainly because I like unpack() a lot. It
can be generated dynamically [untested though]:

push @ignore, 0; # so that $ignore[-1] == 0
my $tmpl;
$tmpl .= "a" x ($ignore[$_] - $ignore[$_-1] - 1) . "x" for 0 .. $#ignore - 1;
$tmpl .= "a*";

# squeeze it for stylistic reasons
$tmpl =~ s/((.)\2+)/$2 . length($1)/ge;

undef $/;
my $x = join '', unpack $tmpl, <$f1>;
my $y = join '', unpack $tmpl, <$f2>;

print "match\n", exit if $x eq $y;
print "*** mismatch\n";

For the given @ignore, the unpack() template would be

aaaxxxxaaaaaaaaaaaaaaxxxxa*

which is

a3x4a14x4a*

after condensing it (not sure whether perl's pack/unpack engine can
handle those templates any quicker).

Finally, only use printf() when you actually make use of its
interpolation features. Again, Perl is not C.

Tassilo
 
D

Don Stock

Also sprach Don Stock:

as in the ape-man discovering tools? :)

Well, as you already noticed yourself, Perl is not C. If you want the
NULL character, then do

substr($x, $_, 1, "\0"), substr($y, $_, 1, "\0") for @ignore;

yeah, shortly after I woke up this morning I thought "hey, literals
have 2 types too! I need to do a type shift!". "chr 0" did the
trick, as does the string literal in your example.

Also, I would probably use an appropriate unpack() pattern to handle the
bytes to be ignored. But that's mainly because I like unpack() a lot. It
can be generated dynamically [untested though]:

I recently was exposed to unpack for the first time, in a perl script
sent in by a customer to decompose a file. I agree, it's very nice.
I'll play with your example.

Finally, only use printf() when you actually make use of its
interpolation features. Again, Perl is not C.

by "interpolation features" you mean %d and such? I don't see any
harm in using printf (correct me if I'm wrong). Plus, I once ran into
a problem with "print $x" where $x contained a '%' or '@' (I don't
remember for sure) and was evaluated at that point like a hash (or
array). So I got in the habit of always using printf ('printf
"%s\n",$x' cured it). Except that I couldn't recreate it just now
with my new(er) version of perl, so maybe it was a bug that's gone
away. Or maybe I simply screwed up back then. Or who knows...

"hey buddy, toss me the upper leg bone of that antelope willya? I
need to go get breakfast."

don
 
T

Tassilo v. Parseval

Also sprach Don Stock:
as in the ape-man discovering tools? :)

Hmmh? I would carefully doubt that.
by "interpolation features" you mean %d and such?

Yes, exactly.
I don't see any harm in using printf (correct me if I'm wrong).

No harm per se. Just unnecessary work for the Perl interpreter.
Plus, I once ran into a problem with "print $x" where $x contained a
'%' or '@' (I don't remember for sure) and was evaluated at that point
like a hash (or array). So I got in the habit of always using printf
('printf "%s\n",$x' cured it). Except that I couldn't recreate it
just now with my new(er) version of perl, so maybe it was a bug that's
gone away. Or maybe I simply screwed up back then. Or who knows...

What you describe can't have happened. Perl does no double-interpolation
(and it didn't do so in older versions either):

my $var = '@array';
print "$var is an array\n";
__END__
@array is an array

Same behaviour for a '%' and '$' and in fact any character.
"hey buddy, toss me the upper leg bone of that antelope willya? I
need to go get breakfast."

I always devour antilopes as a whole. There's never anything left when I
am through with one. Sorry.

Tassilo
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,827
Latest member
DMUK_Beginner

Latest Threads

Top