Since people've been quoting Henry Spencer:
10. Thou shalt foreswear, renounce, and abjure the vile heresy which
claimeth that ``All the world's a VAX'', and have no commerce with
the benighted heathens who cling to this barbarous belief, that the
days of thy program may be long even though the days of thy current
machine be short.
I believe on my machine (FreeBSD/ZFS) mkdir does a single write to the
intent log, which is on an SSD. The writes to spinning rust come (a good
deal) later, and are thoroughly batched.
I did write:
So I was obviously aware of that (Linux isn't that different from BSD in
that regard. Unix has had a write back cache since the very beginning,
although BSD FFS was for some time infamous for synchronous inode updates
(which would hit mkdir hard), but that was solved with soft updates for
FFS long before ZFS came along. Linux ext never did that (by default)).
In many cases the real I/O may come only after the program causing it
has finished. In that case the I/O won't influence the program, but it
may slow down some other program, so completely ignoring it seems like
cheating. So in my benchmarks I created enough directories that the
kernel had to do real I/O while the program was still running.
It also very much raises the question whether it is worthwhile to
optimize. George's benchmark takes less than half a second to create 5000
directorys on my system. The difference between the slowest and the
fastest method is about 0.1 seconds. In relative terms, that's a nice
improvement (~ 25%), but it's still only 0.1 seconds in a program which
creates 5000 directorys, and then presumably *does something*
with those directories. Is it worthwhile to spend effort saving that 0.1
seconds which are almost certainly negligible compared to the total time
the program runs?
Measure, optimize, measure again. Concentrate on the parts where your
program spends most of its time. Look for algorithmic improvements,
not microoptimizations (maybe you don't need those 5000 directories at
all?).
You've missed an important overhead: the system call itself. This is
never cheap, and depending on architecture can be very expensive.
I didn't miss it, I deliberately didn't mention it.
I started programming in the 1980s, and I have that "OMG, a system call
is expensive, I must avoid it!!!" gut feeling, too. But on my 1.8GHz
Core2 a call to time(2) takes less than 160 ns, which compares
favourably to about 200 ns for a call to an empty perl sub.
Again: Measure, optimize, measure again. Gut feelings are dangerous.
hp
PS: I have written at least two implementations of mkdir_p myself. I
don't remember if I wasn't aware File:
ath::make_path at the
time or if I had a reason to write my own. It doesn't really matter
- the function is so simple that writing it doesn't take much more
time than reading the docs.