D
Don Stock
here's an interesting problem I had today. I wrote a perl script to
compare two binary files byte-by-byte, ignoring certain offsets that
contain timestamps and such. So I did this:
# error if not 2 args or files don't exist or the args reference
# the same file
...
open IN1,$ARGV[0] or die "can't open ",$ARGV[0],"\n"; binmode IN1;
open IN2,$ARGV[1] or die "can't open ",$ARGV[1],"\n"; binmode IN2;
for(;;$offset++){
# get 1 byte from each file
$len1 = read IN1,$a,1;
$len2 = read IN2,$b,1;
# error if either of the reads had an error
...
last if !$len1;
# next if ignoring this offset
...
# if mismatch
if ($a != $b) {die "*** mismatch\n"}
}
print "match\n";
the files always matched, even though they were different in places
that weren't being ignored. Those in the know will see immediately
why the above didn't work. I was stumped. I tried all sorts of
things until I realized that my nice single-byte values were
string-types to perl, the problem of course being that perl converts a
string to a number by taking the value indicated by any initial ascii
digits. If there aren't any initial ascii digits, the value is 0.
Replacing "!=" with "ne" fixed it, and so did "if (ord $a != ord $b)
....".
I think my c instincts got in my way here. I was thinking of the
single-byte values as characters in the c sense, and thus
interchangeable with integers. In c one can say this:
int x = 'a';
printf ("%d\n",x);
==> output is 97
but in perl the effect is quite different:
$x = 'a';
printf "%d\n",$x;
==> output is 0 because 'a' has no leading digits
in this case it's ord to the rescue:
$x = 'a';
printf "%d\n",ord $x;
==> output is 97
the ord function returns the first byte in the string as a numeric
type. Which gives us this result:
num str
--- ---
$x 0 'a'
ord $x 97 '97'
97 and 'a' have the same bit pattern, but that bit pattern has changed
its type.
to convert all bytes in the string $str to a list of numbers (one byte
per number), use this:
@nums = unpack('C*',$str);
anyway, there are two morals here. The first is that perl doesn't
have character types; perl has string types and numeric types and
that's it. The second is that when read reads something, it's a
string type, not a numeric type (which is obvious when reading 500
bytes, but easy to forget when reading 1).
fwiw
don
compare two binary files byte-by-byte, ignoring certain offsets that
contain timestamps and such. So I did this:
# error if not 2 args or files don't exist or the args reference
# the same file
...
open IN1,$ARGV[0] or die "can't open ",$ARGV[0],"\n"; binmode IN1;
open IN2,$ARGV[1] or die "can't open ",$ARGV[1],"\n"; binmode IN2;
for(;;$offset++){
# get 1 byte from each file
$len1 = read IN1,$a,1;
$len2 = read IN2,$b,1;
# error if either of the reads had an error
...
last if !$len1;
# next if ignoring this offset
...
# if mismatch
if ($a != $b) {die "*** mismatch\n"}
}
print "match\n";
the files always matched, even though they were different in places
that weren't being ignored. Those in the know will see immediately
why the above didn't work. I was stumped. I tried all sorts of
things until I realized that my nice single-byte values were
string-types to perl, the problem of course being that perl converts a
string to a number by taking the value indicated by any initial ascii
digits. If there aren't any initial ascii digits, the value is 0.
Replacing "!=" with "ne" fixed it, and so did "if (ord $a != ord $b)
....".
I think my c instincts got in my way here. I was thinking of the
single-byte values as characters in the c sense, and thus
interchangeable with integers. In c one can say this:
int x = 'a';
printf ("%d\n",x);
==> output is 97
but in perl the effect is quite different:
$x = 'a';
printf "%d\n",$x;
==> output is 0 because 'a' has no leading digits
in this case it's ord to the rescue:
$x = 'a';
printf "%d\n",ord $x;
==> output is 97
the ord function returns the first byte in the string as a numeric
type. Which gives us this result:
num str
--- ---
$x 0 'a'
ord $x 97 '97'
97 and 'a' have the same bit pattern, but that bit pattern has changed
its type.
to convert all bytes in the string $str to a list of numbers (one byte
per number), use this:
@nums = unpack('C*',$str);
anyway, there are two morals here. The first is that perl doesn't
have character types; perl has string types and numeric types and
that's it. The second is that when read reads something, it's a
string type, not a numeric type (which is obvious when reading 500
bytes, but easy to forget when reading 1).
fwiw
don