An efficient way to make a large directory tree?

N

Nan Wang

Let's say I have a list of directories:

a/b/c
a/b/c/d
a/b/c/e
a/b/o
a/c
a/c/q
a/x
a/x/y

I'm trying to find the most efficient way to parse this list and come up
with the fewest mkpath call. In other words, only mkpath the leaf nodes:

a/b/c/d
a/b/c/e
a/b/o
a/c/q
a/x/y

So far I'm parsing the list, then for each path, find the parent directory,
use that as the key of a hash, and increment it by 1 for each path under that
key, and only call mkpath on the hash elements with a value of 0. Is there a
better way of doing this?

Thanks in advance.
 
S

Scott W Gifford

[...]
I'm trying to find the most efficient way to parse this list and come up
with the fewest mkpath call. In other words, only mkpath the leaf nodes:
[...]

So far I'm parsing the list, then for each path, find the parent directory,
use that as the key of a hash, and increment it by 1 for each path under that
key, and only call mkpath on the hash elements with a value of 0. Is there a
better way of doing this?

That sounds like a pretty good technique. From an algorithmic
perspective, it's O(n lg n), which probably the best you can expect
unless there's a limit to the depth of the directories or something
similar. From a practical perspective it sounds fairly sound, too.
Perl's implementation of hashes is very efficient, and it sounds like
that's where the biggest bottleneck will be. It also sounds
straightforward to code and to follow, which is a plus.

It would be interesting to know if it's more efficient to do this
calculation or just mkpath() everything. Algorithmically a bunch of
mkpath()s would be O(n), but practically it would probably be slower,
because of the cost of system calls.

Just MHO, but hope it's helpful!

----ScottG.
 
R

Richard Gration

Let's say I have a list of directories:

a/b/c
a/b/c/d
a/b/c/e
a/b/o
a/c
a/c/q
a/x
a/x/y

I'm trying to find the most efficient way to parse this list and come up
with the fewest mkpath call. In other words, only mkpath the leaf nodes:

a/b/c/d
a/b/c/e
a/b/o
a/c/q
a/x/y

So far I'm parsing the list, then for each path, find the parent directory,
use that as the key of a hash, and increment it by 1 for each path under that
key, and only call mkpath on the hash elements with a value of 0. Is there a
better way of doing this?

Thanks in advance.

I'm trying to find a way to do the loop with map (and maybe grep) but I
can't find it atm, but how about this:

rich@richg ~/perl $ cat test.pl
#!/usr/bin/perl

use strict;
use warnings;

chomp (my @paths = reverse sort { $a=~s#/#/#g <=> $b=~s#/#/#g } <DATA>);
my $seen = '';
foreach (@paths) {
next if $seen =~ /:$_/;
print "$_\n";
$seen .= ":$_";
}

__END__
a/b/c
a/b/c/d
a/b/c/e
a/b/o
a/c
a/c/q
a/x
a/x/y
rich@richg ~/perl $ ./test.pl
a/b/c/e
a/b/c/d
a/x/y
a/c/q
a/b/o
 
R

Richard Gration

I'm trying to find a way to do the loop with map (and maybe grep) but I
can't find it atm

Got it.

rich@richg ~/perl $ cat test.pl
#!/usr/bin/perl

use Data::Dump qw(dump);

use strict;
use warnings;

chomp (my @paths = reverse sort { $a=~s#/#/#g <=> $b=~s#/#/#g } <DATA>);
my $seen = '';
print join "\n", grep {defined} map { $seen =~ /:$_/ ? undef : ($_,$seen .= ":$_")[0] } @paths;
 
S

Scott W Gifford

[...]
my $seen = '';
print join "\n", grep {defined} map { $seen =~ /:$_/ ? undef : ($_,$seen .= ":$_")[0] } @paths;

This would be more efficient, simpler, and more correct in the face of
directory names containing colons, if $seen were a hash. Also,
returning an empty list to map gets the item ignored, which can
eliminate the grep {defined}. Something like this (untested):

my %seen;
print join "\n", map { $seen{$_} ? () : ($_,$seen{$_}++)[0] } @paths;

Regardless, I'm not sure this is any better than the original.

---ScottG.
 
R

Richard Gration

#!/usr/bin/perl
print "$_\n" for grep {defined} map { chomp; $seen =~ /:$_/ ? undef :
($_,$seen .= ":$_")[0] } reverse sort { $a=~s#/#/#g <=> $b=~s#/#/#g }
<DATA>
__END__
a/b/c
a/b/c/d
a/b/c/e
a/b/o
a/c
a/c/q
a/x
a/x/y
 
R

Richard Gration

[...]
my $seen = '';
print join "\n", grep {defined} map { $seen =~ /:$_/ ? undef :
($_,$seen .= ":$_")[0] } @paths;

This would be more efficient, simpler, and more correct in the face of
directory names containing colons, if $seen were a hash. Also,
returning an empty list to map gets the item ignored, which can
eliminate the grep {defined}. Something like this (untested):

my %seen;
print join "\n", map { $seen{$_} ? () : ($_,$seen{$_}++)[0] }
@paths;

This doesn't work the same. Mine will not create /a/b if it's seen /a/b/c,
yours will. However the empty list I did not know about, thank you.

Regardless, I'm not sure this is any better than the original.

No, nor me, but it's Friday afternoon and I've already succeeded ... I've
learned something :)

How about:

map { chomp; $seen =~ /:$_/ ? () : ($_,$seen .= ":$_")[0] } reverse sort {
$a=~s#/#/#g <=> $b=~s#/#/#g } <DATA>
 
S

Scott W Gifford

[...]
my %seen;
print join "\n", map { $seen{$_} ? () : ($_,$seen{$_}++)[0] }
@paths;

This doesn't work the same. Mine will not create /a/b if it's seen /a/b/c,
yours will. However the empty list I did not know about, thank you.

You're right, sorry!

---ScottG.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,164
Messages
2,570,898
Members
47,439
Latest member
shasuze

Latest Threads

Top