J
jl_post
Hi,
Recently someone asked me to write a Perl script that would operate
on a bunch of input files specified at the command line. This script
was meant for a Unix-ish system, but I developed it on an MSWin
system.
Normally, when I write a Perl script that takes an arbitrary number
of input files, I will include the line:
@ARGV = map glob, @ARGV; # for DOS globbing
I do this because the DOS shell passes in wildcard designators
unexpanded -- that is, if "*.txt" is specified as the only argument,
@ARGV will have "*.txt" as its only element. Unix shells, on the
other hand, usually expand the wildcards, so the glob()bing doesn't
have to be done.
So by using the above line I ensure that the script will have the
same wildcard-expanding behavior whether it is run in DOS or in Unix.
However, if the above line is called when run under Unix, then
technically the wildcard expansions get run twice: Once at the
command line, and once in my script. This will be a problem if any of
the input files have wildcard characters or spaces in them. For
example, if I have a file named "a b", and I call my Perl script with
"perl script.pl a*", then "a*" expands to include "a b", but then the
glob() call in the script expands that to "a" and "b", ignoring file
"a b" altogether.
So to work around that problem, I wrote my script so that it only
did file globbing if it was running on a Windows platform, like this:
if ($^O =~ m/^MSWin/i)
{
@ARGV = map glob, @ARGV; # for DOS globbing
}
This way, input arguments won't get "double-globbed."
Happy with this, I sent my script to the person who needed it. He
responded by saying that "the argument list [was] too long." It turns
out that the wildcard expression he was using expanded out to nearly
16,000 files, which caused the Unix shell he was using to refuse to
run the resulting (long) command line.
So I made a quick change to my script: I removed the above if-
check and advised him to pass in the wildcarded arguments surrounded
by quotes. That way the shell wouldn't expand out the wildcards,
leaving Perl to do it.
That "work-around" worked great. But that led me to ask: In
future scripts, should I include the check for $^O before calling glob
()? If I don't, then the files risk being "double-globbed" on Unix
systems -- but if I do, then I run the risk of the shell refusing to
call the script (without an available work-around).
Of course, this is often a moot point, as more than 99% of the
input files I ever processed have no wildcard characters or spaces in
their filenames. But that's a guarantee I can't always make.
Perhaps I could still call glob() by default on all systems, but
include a command-line switch that forces that not to happen (in order
to prevent double-globbing). That way, the switch could be mostly
ignored, but it is there in case it's ever needed.
Or am I just overthinking this? After all, glob()bing @ARGV in all
instances (that is, regardless of platform) has never given me a
problem (yet). Maybe I should just leave it in (to be called all the
time) after all.
What are your opinions on this? Is there a convention you use that
addresses this issue? In there an alternate way you prefer to handle
it?
Your thoughts and opinions are welcome.
Thanks,
-- Jean-Luc
Recently someone asked me to write a Perl script that would operate
on a bunch of input files specified at the command line. This script
was meant for a Unix-ish system, but I developed it on an MSWin
system.
Normally, when I write a Perl script that takes an arbitrary number
of input files, I will include the line:
@ARGV = map glob, @ARGV; # for DOS globbing
I do this because the DOS shell passes in wildcard designators
unexpanded -- that is, if "*.txt" is specified as the only argument,
@ARGV will have "*.txt" as its only element. Unix shells, on the
other hand, usually expand the wildcards, so the glob()bing doesn't
have to be done.
So by using the above line I ensure that the script will have the
same wildcard-expanding behavior whether it is run in DOS or in Unix.
However, if the above line is called when run under Unix, then
technically the wildcard expansions get run twice: Once at the
command line, and once in my script. This will be a problem if any of
the input files have wildcard characters or spaces in them. For
example, if I have a file named "a b", and I call my Perl script with
"perl script.pl a*", then "a*" expands to include "a b", but then the
glob() call in the script expands that to "a" and "b", ignoring file
"a b" altogether.
So to work around that problem, I wrote my script so that it only
did file globbing if it was running on a Windows platform, like this:
if ($^O =~ m/^MSWin/i)
{
@ARGV = map glob, @ARGV; # for DOS globbing
}
This way, input arguments won't get "double-globbed."
Happy with this, I sent my script to the person who needed it. He
responded by saying that "the argument list [was] too long." It turns
out that the wildcard expression he was using expanded out to nearly
16,000 files, which caused the Unix shell he was using to refuse to
run the resulting (long) command line.
So I made a quick change to my script: I removed the above if-
check and advised him to pass in the wildcarded arguments surrounded
by quotes. That way the shell wouldn't expand out the wildcards,
leaving Perl to do it.
That "work-around" worked great. But that led me to ask: In
future scripts, should I include the check for $^O before calling glob
()? If I don't, then the files risk being "double-globbed" on Unix
systems -- but if I do, then I run the risk of the shell refusing to
call the script (without an available work-around).
Of course, this is often a moot point, as more than 99% of the
input files I ever processed have no wildcard characters or spaces in
their filenames. But that's a guarantee I can't always make.
Perhaps I could still call glob() by default on all systems, but
include a command-line switch that forces that not to happen (in order
to prevent double-globbing). That way, the switch could be mostly
ignored, but it is there in case it's ever needed.
Or am I just overthinking this? After all, glob()bing @ARGV in all
instances (that is, regardless of platform) has never given me a
problem (yet). Maybe I should just leave it in (to be called all the
time) after all.
What are your opinions on this? Is there a convention you use that
addresses this issue? In there an alternate way you prefer to handle
it?
Your thoughts and opinions are welcome.
Thanks,
-- Jean-Luc