Peter J. Holzer said:
Quoth George Mpouras <
[email protected]>:
sub Mkdir_recursive
{
return 1 if $_[0] eq '' || -d $_[0];
Mkdir_recursive( $_[0] =~/^(.*?)[\\\/][^\\\/]+$/ ) || return undef;
mkdir $_[0] || return undef
}
You'd be better off calling mkdir blind and keying off $! if it fails.
That way you save a stat in the case where the creation succeeds.
That shouldn't make a noticeable difference. If the stat does cause any
disk accesses, those would also have been caused by the mkdir, and if it
doesn't (i.e. everything is already in the cache) the time for the stat
calls is completely swamped by the mkdir's.
Both stat and mkdir are system calls and 'one system call' is going to
be faster than 'two system calls'.
As stated, that's trivially untrue ('one call to exec(2)' will *not* be
faster than 'two calls to time(2)' except under pathological
circumstances), but even if I translate that into 'an additional system
call will take additional time', it's not necessarily true.
In this case I think that the stat system call would normally add a
little time but it will be completely swamped by the time spent in
mkdir:
Each successful mkdir call will cause at least 5 disk accesses on a
typical Linux file system: 1 for the journal, 2 for the inode and
content of the parent directory and 2 for the inode and content of the
new directory (Oh, I forgot the bitmaps, add another 2 or 4 ...). These
will happen *after* mkdir returns because of the writeback cache, and
the kernel will almost certainly succeed in coalescing at least some and
maybe many of those writes, but if you create a lot of directories
(George wrote about "thousands", in my tests I created about 150000)
these writes will eventually dominate.
Now, in addition to writing new blocks, where does the pair
stat($d); mkdir($d)
spend time?
If the ancestor directories of $d are in cache (that would be the normal
case), both stat and mkdir will walk exactly the same in-memory
structure until they fail to find $d. So, yes, that part will be
uselessly duplicated, but it's very fast compared to actually writing a
new directory to the disk, so the extra time is negligible.
If the ancestor directories of $d are not in cache, stat will load them
into the cache, which may take a noticable time. But that time will then
be saved by mkdir which can now use the cache instead of loading the
directories itself: So again the difference is one walk through
in-memory structures, which is insignificant compared to loading the
structures from disk and then writing a new directory (which will happen
anyway).
The ratios will be different depending on the relative speed of RAM and
storage: Maybe SSDs are fast enough that the additional walk through the
cache is noticable, but I doubt it. Of course anybody is free to post
benchmark results to prove me wrong.
hp