the fastest way to create a directory

K

Keith Keller

mkdir_p("/opt/d1/d2/d3/d4/d5") or die("$!");

So, either you run your test scripts as root, or your user account has
write permissions to a system directory. Neither of these situations
sounds particularly smart. Given your complete inability to post
properly to usenet, and your inability to refute Peter's benchmarks with
some of your own, I don't see how you can expect anyone to take your
code seriously.

--keith
 
J

Justin C

So, either you run your test scripts as root, or your user account has
write permissions to a system directory. Neither of these situations
sounds particularly smart. Given your complete inability to post
properly to usenet, and your inability to refute Peter's benchmarks with
some of your own, I don't see how you can expect anyone to take your
code seriously.


I don't think anyone takes his code seriously.


Justin.
 
C

Charlton Wilbur

"GM" == George Mpouras
GM> yes there are about thousands dirs. try to undestand the
GM> code. looks simple but it is not

It's not simple. It's also not measurably more efficient than shelling
out to mkdir -p or invoking File::path's make_path function.

Thou shalt make thy program's purpose and structure clear to thy fellow
man; for thy creativity is better used in solving problems than in
creating beautiful new impediments to understanding.

Charlton
 
G

George Mpouras

Believe it or the following makes 18 uneccassery disk accesses









use Errno qw(EEXIST ENOTDIR);
my $access=0;
mkdir_p("/opt/d1/d2/d3/d4/d5") or die("$!");
mkdir_p("/opt/d1/d2/d3/d4/d5") or die("$!");
mkdir_p("/opt/d1/d2/d3/d4/d5") or die("$!");
print "disk touches : $access\n";



sub mkdir_p
{
my ($cur, $next, $remain);
$remain = $_[0];

while ($remain) {
($next, $remain) = $remain =~ /^(\/*[^\/]*)(.*)/;
$cur .= $next;


$access++;

mkdir($cur) or do {
return if $! != EEXIST;
next if $remain;

-d($cur) or $! = ENOTDIR, return;
};
}

return 1;
}
 
G

George Mpouras

#!/usr/bin/perl
# Some benchmark for what purposed until now ...


use Errno qw(EEXIST ENOTDIR);
use Time::HiRes;
use File::path;
my $start_time=[];
my $err;



foreach my $code (qw/
perl_module_file_path
mkdir_p_Rainer_Weikusa
Mkdir_recursive
/)

{
*Test_the_code = \&{$code};
system('/bin/rm -rf /tmp/test_this') if -d '/tmp/test_this';
$start_time = [ Time::HiRes::gettimeofday ];
print "Testing $code ... ";

foreach (1..5_000)
{
my $dir = join('/','/tmp/test_this',split '',$_);
Test_the_code($dir) or die("$!");
}

print "finished at ", Time::HiRes::tv_interval($start_time) ," sec\n";
}






# Using Perl module File::path
sub perl_module_file_path
{
File::path::mkpath( $_[0], {'error' => \$err} );
@{$err} ? 0 : 1
}





sub mkdir_p_Rainer_Weikusa
{
my ($cur, $next, $remain);
$remain = $_[0];
while ($remain) {
($next, $remain) = $remain =~ /^(\/*[^\/]*)(.*)/;
$cur .= $next;
mkdir($cur) or do {
return if $! != EEXIST;
next if $remain;
-d($cur) or $! = ENOTDIR, return;
};
}
return 1;
}




# George Bouras
sub Mkdir_recursive_George_Bouras
{
return 1 if $_[0] eq '' || -d $_[0];
Mkdir_recursive( $_[0] =~/^(.*?)[\\\/][^\\\/]+$/ ) || return undef;
mkdir $_[0] || return undef
}
 
G

George Mpouras

#!/usr/bin/perl
# Some benchmark for what purposed until now ...
# Some typos corrected.


use strict;
use warnings;
use Errno qw(EEXIST ENOTDIR);
use Time::HiRes;
use File::path;
my $start_time=[];
my $err;



foreach my $code (qw/
perl_module_file_path
mkdir_p_Rainer_Weikusa
Mkdir_recursive_George_Bouras
/)

{
my $testcode = \&$code;
system('/bin/rm -rf /tmp/test_this') if -d '/tmp/test_this';
$start_time = [ Time::HiRes::gettimeofday ];
print "Testing $code ... ";

foreach (1..5_000)
{
my $dir = join('/','/tmp/test_this',split '',$_);
$testcode->($dir) or die("$!");
}

print "finished at ", Time::HiRes::tv_interval($start_time) ," sec\n";
}






# Using Perl module File::path
sub perl_module_file_path
{
File::path::mkpath( $_[0], {'error' => \$err} );
@{$err} ? 0 : 1
}





sub mkdir_p_Rainer_Weikusa
{
my ($cur, $next, $remain);
$remain = $_[0];
while ($remain) {
($next, $remain) = $remain =~ /^(\/*[^\/]*)(.*)/;
$cur .= $next;
mkdir($cur) or do {
return if $! != EEXIST;
next if $remain;
-d($cur) or $! = ENOTDIR, return;
};
}
return 1;
}




# George Bouras
sub Mkdir_recursive_George_Bouras
{
return 1 if $_[0] eq '' || -d $_[0];
Mkdir_recursive_George_Bouras( $_[0] =~/^(.*?)[\\\/][^\\\/]+$/ ) || return
undef;
mkdir $_[0] || return undef
}
 
R

Rainer Weikusat

"George Mpouras"
Believe it or the following makes 18 uneccassery disk accesses
[...]

use Errno qw(EEXIST ENOTDIR);
my $access=0;
mkdir_p("/opt/d1/d2/d3/d4/d5") or die("$!");
mkdir_p("/opt/d1/d2/d3/d4/d5") or die("$!");
mkdir_p("/opt/d1/d2/d3/d4/d5") or die("$!");
print "disk touches : $access\n";

If it does, you're kernel is seriously disabled. Leaving this aside,
that's the expected result when you call it uselessly three times in a
row with an argument like that.
 
R

Rainer Weikusat

Rainer Weikusat said:
"George Mpouras"
Believe it or the following makes 18 uneccassery disk accesses
[...]

use Errno qw(EEXIST ENOTDIR);
my $access=0;
mkdir_p("/opt/d1/d2/d3/d4/d5") or die("$!");
mkdir_p("/opt/d1/d2/d3/d4/d5") or die("$!");
mkdir_p("/opt/d1/d2/d3/d4/d5") or die("$!");
print "disk touches : $access\n";

If it does, you're kernel is seriously disabled. Leaving this aside,
that's the expected result when you call it uselessly three times in a
row with an argument like that.

BTW: If that's an actual problem you have, a reasonably simple way to
deal with that would be to use a hash to keep track of pathnames
already known to exist provided keeping them all in memory isn't a
problem.
 
R

Rainer Weikusat

Rainer Weikusat said:
"George Mpouras"
Believe it or the following makes 18 uneccassery disk accesses
[...]

use Errno qw(EEXIST ENOTDIR);
my $access=0;
mkdir_p("/opt/d1/d2/d3/d4/d5") or die("$!");
mkdir_p("/opt/d1/d2/d3/d4/d5") or die("$!");
mkdir_p("/opt/d1/d2/d3/d4/d5") or die("$!");
print "disk touches : $access\n";

If it does, you're kernel is seriously disabled. Leaving this aside,
that's the expected result when you call it uselessly three times in a
row with an argument like that.

Other branch of the tree: No matter if the code was written to work
well if most directories already exist (yours) or most directories
don't exist (mine), the stat isn't needed because mkdir already checks
for that. The only downside of that is that the longest pathname needs
special-case treatment because EEXIST can refer to any kind of
filesystem object.

Since this is sort-of a cute little problem, here's another iterative
one, this time biased in the other direction:

use Errno qw(EEXIST ENOENT ENOTDIR);

sub mkdir_p
{
my ($prefix, $suffix, @tbc);

$prefix = $_[0];
mkdir($prefix) and return 1;
unless ($! == ENOENT) {
if ($! == EEXIST) {
return 1 if -d($prefix);
$! = ENOTDIR;
}

return;
}

while (($prefix, $suffix) = $prefix =~ /(.*?)([^\/]+\/*)$/) {
push(@tbc, $suffix);

last if mkdir($prefix) or $! == EEXIST;
return unless $! == ENOENT;
}

$prefix .= pop(@tbc), mkdir($prefix) or return
while @tbc;

return 1;
}

mkdir_p($ARGV[0]) or die($!);
 
R

Rainer Weikusat

Peter J. Holzer said:
It's a slight variation of Henry Spencer's 8th Commandment:

It's a serious abuse of that as the original refers to placement of
braces and indentation:

Thou shalt make thy program's purpose and structure clear to
thy fellow man by using the One True Brace Style, even if thou
likest it not, for thy creativity is better used in solving
problems than in creating beautiful new impediments to
understanding.

Ie, it's not about not confusing people who can't read code and don't
understand simple algorithms by placing them into a file A they happen
to look at for some weird reason instead of pulling something similar
in from a file B they're not looking at (for an equally weird
reason).

The applicable commandment would be the seventh:

Thou shalt study thy libraries and strive not to reinvent them
without cause, that thy code may be short and readable and thy
days pleasant and productive.

Something which comes to mind immediately when reading this would be
"Thou shalt not strive to learn how to program lest some people's
undeservedly comfortable retirement might not remain as comfortable as
they'd like it to be". Or mabye "thou shalt not burden thy fellow men
by forcing overly general solutions to simple problems you happen to
have written 600 years ago onto them -- use them enjoying thine
pleasant and productive days and leave others alone unless they ask
YOU for help".
 
R

Rainer Weikusat

Charlton Wilbur said:
GM> yes there are about thousands dirs. try to undestand the
GM> code. looks simple but it is not

It's not simple. It's also not measurably more efficient than shelling
out to mkdir -p or invoking File::path's make_path function.

While make_path isn't a particularly good implementation, it is by far
not that bad. I tested this with recreating the /usr directory
hierarchy of 'some computer' (11,749 directories) below /tmp using
File::path::make_path, George's recursive function, the 2nd mkdir_p I
posted and 'shelling out to mkdir -p', measureing times via
NYTProf. The results were (% relative to make_path):

system('mkdir', '-p', ...) 10.7s 803.6%

make_path/ mkpath 1.3115s 100%

Mkdir_recursive 0.976s 74.42%

mkdir_p 0.611s 46.58%
 
P

Peter J. Holzer

Since people've been quoting Henry Spencer:

10. Thou shalt foreswear, renounce, and abjure the vile heresy which
claimeth that ``All the world's a VAX'', and have no commerce with
the benighted heathens who cling to this barbarous belief, that the
days of thy program may be long even though the days of thy current
machine be short.

I believe on my machine (FreeBSD/ZFS) mkdir does a single write to the
intent log, which is on an SSD. The writes to spinning rust come (a good
deal) later, and are thoroughly batched.

I did write:

So I was obviously aware of that (Linux isn't that different from BSD in
that regard. Unix has had a write back cache since the very beginning,
although BSD FFS was for some time infamous for synchronous inode updates
(which would hit mkdir hard), but that was solved with soft updates for
FFS long before ZFS came along. Linux ext never did that (by default)).

In many cases the real I/O may come only after the program causing it
has finished. In that case the I/O won't influence the program, but it
may slow down some other program, so completely ignoring it seems like
cheating. So in my benchmarks I created enough directories that the
kernel had to do real I/O while the program was still running.

It also very much raises the question whether it is worthwhile to
optimize. George's benchmark takes less than half a second to create 5000
directorys on my system. The difference between the slowest and the
fastest method is about 0.1 seconds. In relative terms, that's a nice
improvement (~ 25%), but it's still only 0.1 seconds in a program which
creates 5000 directorys, and then presumably *does something*
with those directories. Is it worthwhile to spend effort saving that 0.1
seconds which are almost certainly negligible compared to the total time
the program runs?

Measure, optimize, measure again. Concentrate on the parts where your
program spends most of its time. Look for algorithmic improvements,
not microoptimizations (maybe you don't need those 5000 directories at
all?).

You've missed an important overhead: the system call itself. This is
never cheap, and depending on architecture can be very expensive.

I didn't miss it, I deliberately didn't mention it.

I started programming in the 1980s, and I have that "OMG, a system call
is expensive, I must avoid it!!!" gut feeling, too. But on my 1.8GHz
Core2 a call to time(2) takes less than 160 ns, which compares
favourably to about 200 ns for a call to an empty perl sub.

Again: Measure, optimize, measure again. Gut feelings are dangerous.

hp

PS: I have written at least two implementations of mkdir_p myself. I
don't remember if I wasn't aware File::path::make_path at the
time or if I had a reason to write my own. It doesn't really matter
- the function is so simple that writing it doesn't take much more
time than reading the docs.
 
C

Charlton Wilbur

HL> I like it! Is that a Charltonism? Attribution please if not.

I didn't *intentionally* plagiarize! I thought it was well enough known
to not need a citation.

Charlton
 
C

Charlton Wilbur

HL> Heavens; have I caused offence?

Oh, no, but since you didn't recognize it, I realized my assumption that
it was extremely well known was wrong.

Charlton
 
G

George Mpouras

All the posts and answers at this thread are wrong.

The answers was a range from a "Ah, this is trivial and there is a module !"
to simple N approaches

Sometime later I will post the "the fastest way to create a directory".

Actually it have nothing to do (or a little) with directories.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,082
Messages
2,570,587
Members
47,209
Latest member
Ingeborg61

Latest Threads

Top