binmode and the diamond operator

J

J. Romano

Hi,

I've had a little Perl problem recently that I've been wondering if
there is a solution for.

I'm using ActiveState Perl for Win32, and I need to read in binary
files. I use the diamond operator in a while loop after setting slurp
mode (in order to read in the whole file at once). In other words, my
code looks something like this:

$/ = undef; # set "slurp" mode

while (<>) {
my $fileLen = length $_;
print "File \"$ARGV\" contains $fileLen bytes.\n";
}

With this script, someone could type

perl script.pl file1 file2 file3

and get output like:

File "file1" contains 15 bytes.
File "file2" contains 21 bytes.
File "file3" contains 133 bytes.

Now, I realize that I can find the size of a file by using the -s
filetest operator, but that's not what I want to do (I just printed
the file length as an example). Ultimately I want to peek into the
files and look at the values at specific bytes. But in order to do
this I have to make sure that the \n\r (or \r\n) combination doesn't
get converted to one character (I've been burned by this before).

So I need to set binmode() on these files, but how do I do it with
the diamond operator? The only immediate solution I can think of is
to re-write the code so that it opens and closes a filehandle, like
this:

foreach my $file (@ARGV) {
$/ = undef; # set "slurp" mode

open(FILE, $file) or die "Cannot read \"$file\": $!";
binmode(FILE);
$_ = <FILE>;
close(FILE);

my $fileLen = length $_;
print "File \"$file\" contains $fileLen bytes.\n";
}

This way I have to add four more lines of code and check if open was
successful. I can definitely do it this way, but if there is a
quicker way of using binmode() with the diamond operator, I'd like to
know about it.

So, does anyone know if it is possible to set binmode() when using
the diamond operator (specifically in "slurp" mode)?

Thanks in advance,

Jean-Luc
 
T

Tad McClellan

J. Romano said:
I use the diamond operator in a while loop after setting slurp
mode (in order to read in the whole file at once).
So I need to set binmode() on these files, but how do I do it with
the diamond operator?


binmode ARGV;
 
J

J. Romano

binmode ARGV;

Thanks for the response, Tad, but it doesn't work. At least, I
haven't figured out where to put the that line to make it work
correctly. Should I put it before the "while (<>)" loop or inside it?
I tried both ways out on this small program:

#!/usr/bin/perl -w
use strict;

$/ = undef; # set "slurp" mode

# binmode(ARGV); # Do I put the binmode() call here...

while (<>) {
binmode(ARGV); # ...or do I put it here?
my $fileLen = -s $ARGV;
my $numChars = length $_;
print "File \"$ARGV\" contains $fileLen bytes",
" and $numChars characters.\n";
}
__END__

When I put "binmode(ARGV)" before the while loop I get the
following warning:

binmode() on unopened filehandle ARGV at script.pl line 6.

and when I put it as the first line of the while loop, the file has
already been read in before it is affected by the binmode() change.

Therefore, if I run this script with the name of a one-line text
file, the number of characters will always be one less than the number
of bytes (due to the fact that the newline "character" is stored as
two bytes on Win32), which shows that binmode() is not having the
effect I wanted.

One main reason I want binmode() with the diamond operator is that
I want to use it with the -ne switches, like this:

perl -lne "BEGIN{$/=undef} print ord substr($_,99,1)" file1 file2

This one-liner prints out the ASCII value of the hundredth byte of
file1 and file2. However, if there is a \n\r (or \r\n) before the
hundredth byte, the offset will be affected and the output will no
longer be correct.

So, can I still use binmode() with the diamond operator (or with
the -n switch)? If I have to use "binmode(ARGV)", where do I place
it? Do I put it right after I undef the $/ variable, or inside the
while loop? Or do I put it somewhere else entirely?

(Keep in mind that I'm using ActiveState Perl on a Win32 machine,
so setting binmode() really does make a difference in my case.)

Thanks for any responses.

-- Jean-Luc
 
B

Ben Morrow

(e-mail address removed) (Tad McClellan) wrote in message


Thanks for the response, Tad, but it doesn't work. At least, I
haven't figured out where to put the that line to make it work
correctly. Should I put it before the "while (<>)" loop or inside it?

A nice conundrum!

I can't find any way to make it work with 5.6... if you're using that
I think you'll have to write the loop 'properly' (ie. not use <> and
ARGV, but open and then binmode each file yourself), which pretty much
rules out one-liners. If you are using 5.8 then

perl -Mopen=IO,:raw -0777nwe'$n += length; END{print "$n\n"}' crlf

does what you want (this is Unix shell quoting, I'm afraid: you'll
need to correct it to DOS syntax). The -0777 is equivalent to
BEGIN{$/=undef}.

Ben
 
J

J. Romano

So I need to set binmode() on these files, but how do I do it with
A nice conundrum!

I can't find any way to make it work with 5.6... if you're using that
I think you'll have to write the loop 'properly' (ie. not use <> and
ARGV, but open and then binmode each file yourself), which pretty much
rules out one-liners.

I tried to find a solution to this problem, and I found a little
work around. For those of you who don't remember, I was trying get
the following code to use the diamond operator in binmode (on Win32
platforms) so my \r\n or \n\r combinations wouldn't get converted to
one character:

#!/usr/bin/perl -w
use strict;
$/ = undef; # set "slurp" mode
while (<>) {
my $fileLen = -s $ARGV;
my $numChars = length $_;
print "File \"$ARGV\" contains $fileLen bytes",
" and $numChars characters.\n";
}
__END__

This code, when run on Win32 platforms with text file names as
parameters, reports that there are more bytes to the file than
characters. It says this because it considers the \r\n and \n\r
combinations (that occur at newlines) as one character.

My problem is that I wanted to stop this behavior by calling
binmode on the filehandle, so that the number of characters reported
would be the same as the number of bytes. But the diamond operator
doesn't use a filehandle! So how do you tell the diamond operator to
open files in binmode?

The obvious answer, "binmode(ARGV);", didn't work. If called
before the while loop containing the diamond operator, a warning would
appear stating that binmode is being called on an unopened filehandle.
And calling it as the first line of the while loop is too late, since
the line has already been read (in ascii mode) into $_.

The only solution I could think of at the time was to write out the
program the long way, looping through @ARGV and and opening the files
(and setting them with binmode) individually. But, as Ben morrow
said, this pretty much rules out one-liners.

Well, like I said above, I found a little work around. Instead of
adding the line:

binmode(ARGV);

I add the following line as the first line of my while loop:

binmode(ARGV), seek(ARGV,0,0), next if $a = !$a;

so the above script now looks like:

#!/usr/bin/perl -w
use strict;
$/ = undef; # set "slurp" mode
while (<>) {
binmode(ARGV), seek(ARGV,0,0), next if $a = !$a;
my $fileLen = -s $ARGV;
my $numChars = length $_;
print "File \"$ARGV\" contains $fileLen bytes",
" and $numChars characters.\n";
}
__END__

Do you see what it's doing? When it reads a file in for the first
time, it sets the filehandle to binary mode, then rewinds the file
pointer and repeats the loop. The if condition (if $a = !$a) forces
this statement to get executed only on every other loop (otherwise it
would be an infinite loop).

This solution definitely works, but it's obvious that it's not
super-efficient since every file is read twice. If I really wanted to
make it more efficient, I could set $/ to equal a reference to 1 (so
only one byte is read), then set binmode, rewind the pointer, AND set
$/ to undef before restarting the loop. That way only the first byte
would be re-read. Of course, I would have to reset $/ to a reference
to 1 at the end of the loop before the next file is read.

(Some people might point out that I could set $/ to a reference to
0 so I wouldn't have to re-read any bytes at all. Well, I already
tried this and it seems like doing so causes the diamond operator to
read in the entire pseudo-file all at once (in other words, instead of
reading one file at a time, it reads all the files at once and puts
all their contents into $_ as one long string). I tried to find some
documentation that covered this, but I couldn't find any. I'm curious
to know if this is normal behavior.)

But if I use this more efficient solution of only re-reading one
byte, then it's almost more trouble than it's worth, and difficult to
remember for one-liners. So I'll probably stick to the solution:

binmode(ARGV), seek(ARGV,0,0), next if $a = !$a;

for one liners if I'm too lazy to open the files individually.

It's kind of a strange solution, isn't it? At least it works.

-- Jean-Luc
 
J

J. Romano

Steve Grazzini said:
use open IN => ':raw';

Hey, thanks, Steve! That works perfectly! Now the following code
reports the same number of bytes and characters on files on Win32
platforms:

#!/usr/bin/perl -w
use strict;
use open IN => ':raw';
$/ = undef; # set "slurp" mode

while (<>) {
my $fileLen = -s $ARGV;
my $numChars = length $_;
print "File \"$ARGV\" contains $fileLen bytes",
" and $numChars characters.\n";
}
__END__

Thanks again!

-- Jean-Luc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top