A
Adam Funk
I'm writing a program that uses File::Find to recurse through the
files and directories specified as command-line arguments, and to call
process_file() on each one.
By default the program prints each file's results to STDOUT, but if I
give it the -d DIRECTORY option, it should print each file's output to
a file in DIRECTORY with ".txt" at the end of the name instead of
".xml". There are a lot of non-English UTF-8 characters in the input
and output.
At the moment, I have the following near the beginning of the program:
binmode (STDOUT, ":utf8");
*OUTPUT = *STDOUT ;
and the following for each input file:
sub process_file {
# find is called with the no_chdir option set
my $input_filename = $_;
my $output_filename = $input_filename;
if ($option{x} || ($input_filename =~ m!\.xml$!i ) ) {
if ($option{d}) {
# drop the ".xml" suffix
$output_filename =~ s!\.xml$!!i ;
# drop the relative path
$output_filename =~ s!^.*/!! ;
# add the new path and suffix
$output_filename = $option{d} . "/" . $output_filename;
$output_filename = $output_filename . ".txt";
open(OUTPUT, ">" . $output_filename);
binmode (OUTPUT, ":utf8");
}
print(STDERR "Reading : ", $input_filename, "\n");
# ... CODE THAT CALLS OTHER SUBROUTINES TO READ THE
# INPUT FILE, PROCESS IT, AND print(OUTPUT ...) A
# LOT OF STUFF
if ($option{d}) {
print(STDERR "Wrote : ", $output_filename, "\n");
close(OUTPUT);
}
}
else {
print(STDERR "Ignoring: ", $File::Find::name, "\n");
}
}
As far as I can tell, this works and cleanly suppresses the "Wide
character" warnings. Is this use of filehandle assignment OK, or am I
likely to run into trouble later?
Also, why is it necessary to set binmode on OUTPUT every time I open
it?
Thanks,
Adam
files and directories specified as command-line arguments, and to call
process_file() on each one.
By default the program prints each file's results to STDOUT, but if I
give it the -d DIRECTORY option, it should print each file's output to
a file in DIRECTORY with ".txt" at the end of the name instead of
".xml". There are a lot of non-English UTF-8 characters in the input
and output.
At the moment, I have the following near the beginning of the program:
binmode (STDOUT, ":utf8");
*OUTPUT = *STDOUT ;
and the following for each input file:
sub process_file {
# find is called with the no_chdir option set
my $input_filename = $_;
my $output_filename = $input_filename;
if ($option{x} || ($input_filename =~ m!\.xml$!i ) ) {
if ($option{d}) {
# drop the ".xml" suffix
$output_filename =~ s!\.xml$!!i ;
# drop the relative path
$output_filename =~ s!^.*/!! ;
# add the new path and suffix
$output_filename = $option{d} . "/" . $output_filename;
$output_filename = $output_filename . ".txt";
open(OUTPUT, ">" . $output_filename);
binmode (OUTPUT, ":utf8");
}
print(STDERR "Reading : ", $input_filename, "\n");
# ... CODE THAT CALLS OTHER SUBROUTINES TO READ THE
# INPUT FILE, PROCESS IT, AND print(OUTPUT ...) A
# LOT OF STUFF
if ($option{d}) {
print(STDERR "Wrote : ", $output_filename, "\n");
close(OUTPUT);
}
}
else {
print(STDERR "Ignoring: ", $File::Find::name, "\n");
}
}
As far as I can tell, this works and cleanly suppresses the "Wide
character" warnings. Is this use of filehandle assignment OK, or am I
likely to run into trouble later?
Also, why is it necessary to set binmode on OUTPUT every time I open
it?
Thanks,
Adam