An efficient way to make a large directory tree?

Nan Wang · Dec 3, 2004

Let's say I have a list of directories:

a/b/c
a/b/c/d
a/b/c/e
a/b/o
a/c
a/c/q
a/x
a/x/y

I'm trying to find the most efficient way to parse this list and come up
with the fewest mkpath call. In other words, only mkpath the leaf nodes:

a/b/c/d
a/b/c/e
a/b/o
a/c/q
a/x/y

So far I'm parsing the list, then for each path, find the parent directory,
use that as the key of a hash, and increment it by 1 for each path under that
key, and only call mkpath on the hash elements with a value of 0. Is there a
better way of doing this?

Thanks in advance.

Scott W Gifford · Dec 3, 2004

[...]

I'm trying to find the most efficient way to parse this list and come up
with the fewest mkpath call. In other words, only mkpath the leaf nodes:
[...]

So far I'm parsing the list, then for each path, find the parent directory,
use that as the key of a hash, and increment it by 1 for each path under that
key, and only call mkpath on the hash elements with a value of 0. Is there a
better way of doing this?

That sounds like a pretty good technique. From an algorithmic
perspective, it's O(n lg n), which probably the best you can expect
unless there's a limit to the depth of the directories or something
similar. From a practical perspective it sounds fairly sound, too.
Perl's implementation of hashes is very efficient, and it sounds like
that's where the biggest bottleneck will be. It also sounds
straightforward to code and to follow, which is a plus.

It would be interesting to know if it's more efficient to do this
calculation or just mkpath() everything. Algorithmically a bunch of
mkpath()s would be O(n), but practically it would probably be slower,
because of the cost of system calls.

Just MHO, but hope it's helpful!

----ScottG.

Richard Gration · Dec 3, 2004

Let's say I have a list of directories:

a/b/c
a/b/c/d
a/b/c/e
a/b/o
a/c
a/c/q
a/x
a/x/y

I'm trying to find the most efficient way to parse this list and come up
with the fewest mkpath call. In other words, only mkpath the leaf nodes:

a/b/c/d
a/b/c/e
a/b/o
a/c/q
a/x/y

So far I'm parsing the list, then for each path, find the parent directory,
use that as the key of a hash, and increment it by 1 for each path under that
key, and only call mkpath on the hash elements with a value of 0. Is there a
better way of doing this?

Thanks in advance.

I'm trying to find a way to do the loop with map (and maybe grep) but I
can't find it atm, but how about this:

rich@richg ~/perl $ cat test.pl
#!/usr/bin/perl

use strict;
use warnings;

chomp (my @paths = reverse sort { $a=~s#/#/#g <=> $b=~s#/#/#g } <DATA>);
my $seen = '';
foreach (@paths) {
next if $seen =~ /:$_/;
print "$_\n";
$seen .= ":$_";
}

__END__
a/b/c
a/b/c/d
a/b/c/e
a/b/o
a/c
a/c/q
a/x
a/x/y
rich@richg ~/perl $ ./test.pl
a/b/c/e
a/b/c/d
a/x/y
a/c/q
a/b/o

Richard Gration · Dec 3, 2004

I'm trying to find a way to do the loop with map (and maybe grep) but I
can't find it atm

Got it.

rich@richg ~/perl $ cat test.pl
#!/usr/bin/perl

use Data:

ump qw(dump);

use strict;
use warnings;

chomp (my @paths = reverse sort { $a=~s#/#/#g <=> $b=~s#/#/#g } <DATA>);
my $seen = '';
print join "\n", grep {defined} map { $seen =~ /:$_/ ? undef : ($_,$seen .= ":$_")[0] } @paths;

Scott W Gifford · Dec 3, 2004

[...]

my $seen = '';
print join "\n", grep {defined} map { $seen =~ /:$_/ ? undef : ($_,$seen .= ":$_")[0] } @paths;

This would be more efficient, simpler, and more correct in the face of
directory names containing colons, if $seen were a hash. Also,
returning an empty list to map gets the item ignored, which can
eliminate the grep {defined}. Something like this (untested):

my %seen;
print join "\n", map { $seen{$_} ? () : ($_,$seen{$_}++)[0] } @paths;

Regardless, I'm not sure this is any better than the original.

---ScottG.

Richard Gration · Dec 3, 2004

#!/usr/bin/perl
print "$_\n" for grep {defined} map { chomp; $seen =~ /:$_/ ? undef :
($_,$seen .= ":$_")[0] } reverse sort { $a=~s#/#/#g <=> $b=~s#/#/#g }
<DATA>
__END__
a/b/c
a/b/c/d
a/b/c/e
a/b/o
a/c
a/c/q
a/x
a/x/y

Richard Gration · Dec 3, 2004

[...]

my $seen = '';
print join "\n", grep {defined} map { $seen =~ /:$_/ ? undef :
($_,$seen .= ":$_")[0] } @paths;

Click to expand...

This would be more efficient, simpler, and more correct in the face of
directory names containing colons, if $seen were a hash. Also,
returning an empty list to map gets the item ignored, which can
eliminate the grep {defined}. Something like this (untested):

my %seen;
print join "\n", map { $seen{$_} ? () : ($_,$seen{$_}++)[0] }
@paths;

This doesn't work the same. Mine will not create /a/b if it's seen /a/b/c,
yours will. However the empty list I did not know about, thank you.

Regardless, I'm not sure this is any better than the original.

No, nor me, but it's Friday afternoon and I've already succeeded ... I've
learned something

How about:

map { chomp; $seen =~ /:$_/ ? () : ($_,$seen .= ":$_")[0] } reverse sort {
$a=~s#/#/#g <=> $b=~s#/#/#g } <DATA>

Scott W Gifford · Dec 3, 2004

[...]

my %seen;
print join "\n", map { $seen{$_} ? () : ($_,$seen{$_}++)[0] }
@paths;

Click to expand...

This doesn't work the same. Mine will not create /a/b if it's seen /a/b/c,
yours will. However the empty list I did not know about, thank you.

You're right, sorry!

---ScottG.

Joe Smith · Dec 4, 2004

Richard said:
reverse sort { $a=~s#/#/#g <=> $b=~s#/#/#g }

sort { $b=~tr#/#/# <=> $a=~tr#/#/# }

-Joe

Optimal way to make a table for large lists	2	Jul 7, 2022
efficient way to process data	0	Jan 12, 2014
I made a blockchain and want to make a cryptocurrency, but my code doesn't verify hash of each block	2	Jun 2, 2024
ChatGPT will make us Job(Home)less	3	Jan 22, 2023
Blue J Ciphertext Program	2	Nov 22, 2023
My Status, Ciphertext	2	Nov 28, 2023
translating an OS directory recursively into a tree object	8	Dec 7, 2009
best way to make a few changes in a large data file	18	Jan 8, 2013

An efficient way to make a large directory tree?

Nan Wang

Scott W Gifford

Richard Gration

Richard Gration

Scott W Gifford

Richard Gration

Richard Gration

Scott W Gifford

Joe Smith

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads