Better way of checking file/directory spec in Win32?

H

Henry Law

Running under Activestate on Windows XP I have need to feed a series of
directory and file specifications to a program, which then interprets
them. Specifications can be directories, single file names, or wildcard
combinations like c:\foo\bar\*.pdf which result in lists of matching files.

The specifications need to be validated to weed out those that don't
refer to an existing directory, or that result in no matching files.
I've Googled and searched CPAN for a module that does this, since it
must be a common requirement, but without success. Can someone suggest one?

Here's the subroutine I wrote to do the job, so you'll see what I'm
trying to do. If it needs to exist then comments on how to smarten it
up would be welcome.

#! /usr/bin.perl

use strict;
use warnings;

use constant VALID_DIRECTORY => -1;
use constant VALID_FILE => -2;

print "Enter spec: ";
my $spec = <STDIN>;
chomp $spec;
while ($spec) {
my $ret = interpret_spec($spec);
if ($ret == VALID_DIRECTORY) {
print "'$spec' is a valid directory\n";
} elsif ($ret == VALID_FILE) {
print "'$spec' is an existing file\n";
} elsif ($ret) {
print "'$spec' results in $ret files\n";
} else {
print "'$spec' doesn't match any file or directory\n";
}
print "=> ";
$spec=<STDIN>;
chomp $spec;
}

sub interpret_spec{
# Checks a file/directory specification. Returns
# 0 Spec matches nothing
# -1 It's a valid directory specification
# -2 It's a valid single file specification
# number It's a glob, resulting in "number" files

my $spec = shift;
return VALID_DIRECTORY if (-d $spec);
return VALID_FILE if (-f $spec);
my @globs = glob $spec;
if (scalar @globs == 1){
return ((-f $globs[0]) ? 1 : 0);
} else {
return scalar @globs;
}
}
 
A

Anno Siegel

Henry Law said:
Running under Activestate on Windows XP I have need to feed a series of
directory and file specifications to a program, which then interprets
them. Specifications can be directories, single file names, or wildcard
combinations like c:\foo\bar\*.pdf which result in lists of matching files.

The specifications need to be validated to weed out those that don't
refer to an existing directory, or that result in no matching files.

My first idea when I read this was (untested):

my @valid = grep -e, map glob, @specs;

but you seem to want a more detailed analysis of each entry.
I've Googled and searched CPAN for a module that does this, since it
must be a common requirement, but without success. Can someone suggest one?

Here's the subroutine I wrote to do the job, so you'll see what I'm
trying to do. If it needs to exist then comments on how to smarten it
up would be welcome.

#! /usr/bin.perl

use strict;
use warnings;

use constant VALID_DIRECTORY => -1;
use constant VALID_FILE => -2;

[code that uses interpret_spec() snipped]
sub interpret_spec{
# Checks a file/directory specification. Returns
# 0 Spec matches nothing
# -1 It's a valid directory specification
# -2 It's a valid single file specification
# number It's a glob, resulting in "number" files

my $spec = shift;
return VALID_DIRECTORY if (-d $spec);
return VALID_FILE if (-f $spec);
my @globs = glob $spec;
if (scalar @globs == 1){
^^^^^^
You don't need "scalar", it's already in scalar context.
return ((-f $globs[0]) ? 1 : 0);
} else {
return scalar @globs;
}
}

There are issues with some forms of glob() that need looking at.

Under Unix, it is quite possible to have a file named like a glob
pattern: "touch '*.foo'" creates a file with an actual asterisk in
its name. The logic of interpret_spec() would detect the file first
and never try to expand the glob (which may hit several more files).
This problem may be void under windows, I don't know.

A different problem occurs with globs of the form "foo.{a,b,c}".
This expands to "foo.a", "foo.b" and "foo.c" with no check against
the file system, that is, it doesn't matter whether "foo.a" etc.
exist as files. That may result in unwanted answers.

Anno
 
H

Henry Law

Anno said:
My first idea when I read this was (untested):

my @valid = grep -e, map glob, @specs;

but you seem to want a more detailed analysis of each entry.

Hmm; I've a lot still to learn. That wouldn't have been my first idea;
nor my twenty-first either.
Yes, of course. Laziness, unfortunately.
There are issues with some forms of glob() that need looking at.

Under Unix, it is quite possible to have a file named like a glob
pattern: "touch '*.foo'" creates a file with an actual asterisk in
its name.
How horrible. Fortunately nobody other than me is going to run this
thing and I'll just stay away from things like that.
This problem may be void under windows, I don't know.
It is; "*" is not a valid character in Redmond's file names.
A different problem occurs with globs of the form "foo.{a,b,c}".
This expands to "foo.a", "foo.b" and "foo.c" with no check against
the file system
Once again I can avoid those and get round the problem. But thank you
for pointing out the breadth and depth of the problem I am trying to solve!
 
A

Anno Siegel

Henry Law said:
Hmm; I've a lot still to learn. That wouldn't have been my first idea;
nor my twenty-first either.

Ah, but it isn't hard at all once you get into the habit of considering
data structures (like arrays and lists, in this case) as units that are
the objects of operations. In Perl, there are only three "operators",
map, grep, and sort, that transform one list into another, but they are
powerful because they all take a user-supplied function that works on
list elements.

OO takes that way of thinking a step further and lets you define your
own operations on complex data structures. Working with objects for
a while is habit-forming that way. You start to see would-be objects
all over your normal code as well, which is a productive way of looking
at things.
Yes, of course. Laziness, unfortunately.

Ah, the kind of laziness that results in more text rather than less.
Who was it who didn't have time for a short letter, so wrote a long
one instead?
How horrible. Fortunately nobody other than me is going to run this
thing and I'll just stay away from things like that.

It is; "*" is not a valid character in Redmond's file names.

Yes, I learned that in the meantime too.
Once again I can avoid those and get round the problem. But thank you
for pointing out the breadth and depth of the problem I am trying to solve!

I'm afraid it is not so much depth, but a certain murkiness in the
problem that I'm pointing out. The question whether a certain string
"is" a glob or a plain filename can be answered variously. As a
programmer, you must decide. It seems to me that the decision is
rather implicit in the code. If it mattered (as it probably doesn't)
it would be advisable to make the decision explicit.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,744
Latest member
CortneyMcK

Latest Threads

Top