Thanks for the help of all who've answered my post.
Ah.... but how far down the parent directory do you wish to search?
File::Find has a 'finddepth' method and a multitude of options.
I really need it to list all of the directories, no matter how deep it
goes. I've designed the system so that it's simple to make sure that
the directory tree doesn't go too deep, but I didn't want to enforce a
depth because it makes the script less flexiable.
Post your code and maybe we can lend more assistance.
I'm using the method below to build a "tree" structure which
represents the directories on our web server. The main complication is
that sites can have subsites, but in this part of the code I'm only
looking for the subdirectories of one site. If it finds another
subsite it stops recursing. This works because I load all the subsites
into the tree before I load all the subdirectories.
The directories and sites are stored in a tree object that uses the
directory and site path to add new sites/dirs to the tree. It's then
quite easy to recurse the bits I want when I'm printing the tree.
On the page where I'm doing the recursing it prints out only the
subdirectories of the site that don't belong to another subsite. So
it's really only looking at a small part of the tree. The problem is
that "small" is a relative term. I'm testing it with a subsite that
has 800 subdirectories (and over 9000 files) as a worst case scenario
(which isn't the biggest site on the server). I'm not sure I'll be
able to get the load time to anywhere near 10 seconds, but I like
working with such a large site because the effects of changing parts
of the script are exagerated.
The subsites are stored in a database, but the first thing I did was
make sure that all the database accesses happened at the same time. So
there are only two calls to the database (no matter how big the tree
gets) and they both use the same database handle. The database stuff
happens before I go looking for the subdirectories.
my $nodePath = "$basePath/".$node->getDirectory();
find(\&wanted, "$basePath/".$node->getDirectory());
sub wanted {
my $currentFile = $File::Find::name;
if(-d $currentFile) {
if($currentFile ne $nodePath) {
my $newDir = $currentFile;
$newDir =~ s/$basePath\///;
# if this directory is actually a site,
# we only want to recurse it
# if we're told to by the recurseSubSites parameter
if(!$siteTree->isNodeSite($newDir)) {
# if this directory isn't a site,
# add the directory to the site tree
$siteTree->addDirectory($newDir);
} elsif(!$recurseSubSites) {
# we don't want to recurse any of this directory's subdirs
$File::Find:
rune = 1;
} # end if
} # end if
} # end if
} # end wanted
Since I posted here, I've done more comparisons of how fast it runs. A
lot of the problem is with the adding the node to the site tree and
I'm going to try to reduce that by doing sorting within the nodes as I
add them (and probably some other stuff too).
However, it takes a good 10-15 seconds just to print the directories
with the rest of the sub commented out. Perhaps I'm doing something in
an inefficient way? Or is it that I'm going to have to live with this
sort of speed if I'm using perl to recurse that many directories? I
actually didn't realise that I had so many files in the directories, I
thought it was only one or two thousand. I don't think I can rely on
the sorting of the operating system because I'm on a unix system that
seems to just return the files on alphabetical order.
Anyway, any comments or suggestions about the code would be
appreciated. I'm a bit of a newbie perl programmer so I'm just
muddling along and don't really know if I'm doing things the best way.
Thanks again for your help. It's given me a few more things to think
about.
Helen