J
James Kanze
Yes, it's completely "fair" to compare a C++ program which
doesn't understand the concept of a smart pointer to lisp,
where "smart pointers" (in the sense that memory is
garbage-collected) are inherent.
It looks to me like he's comparing to programs which do the same
thing. Probably written by people competent in the language
being used.
I'm familiar with this quote myself. I think it is somewhat
dated; it is comparing C++ as it was written some years ago with
Lisp as it was implemented some years ago. In practice, C++ has
evolved, both with regards to the language, and with regards to
the way competent programmers use it. I don't know about Lisp,
but I do know that garbage collectors have also evolved, and
that modern garbage collection is considerably faster than most
implementations of malloc (which haven't evolved). In many use
patterns, anyway.
Yes: It requires more work to make that work in C++ with smart
pointers than the same thing in lisp (although making a
copy-on-write version of std::vector is not all that hard).
Making one that is multithread safe, and still has the desired
performance, is far from trivial. The experts who wrote the g++
library tried for std::string, and failed. (I'm obviously
ignoring the complexity issues; you can't use copy on write for
a fully conformant implementation of std::vector, because
something like operator[] could no longer be implemented in
constant time.)
However, that doesn't mean that the C++ version cannot be as
efficient as the lisp version. It only means the C++
programmers were incompetent.
Bullshit.
I still wouldn't trade modules (the most crucial part of OOP)
for GC, if I had to make an excluding choice.
I agree, in C++, and if we had any change of getting full
support for modules in C++, in this version, I'd be for it. But
there is no existing practice to base it on, which means that
modules are a lot more work, and involve considerable risk.
»[A]llocation in modern JVMs is far faster than the best
performing malloc implementations. The common code path
for new Object() in HotSpot 1.4.2 and later is
approximately 10 machine instructions (data provided by
Sun; see Resources), whereas the best performing malloc
implementations in C require on average between 60 and 100
instructions per call (Detlefs, et. al.; see Resources).
I like how this misses one of the main reasons why memory
allocation can be slow: Cache behavior.
True. For good cache behavior, you need a copying garbage
collector, and I don't think that that's in the cards for C++.
From experience I would estimate that the number of
instructions executed when allocating/deallocating amounts to
maybe 10-20% of the total speed, and the remaining 80-90%
depends on how cache-friendly the allocation system is. Cache
misses are enormously expensive.
How much of a role they play depends largely on the application.
And while a copying collector can definitely have considerably
better locality than any manual allocator, I'm not sure that it
makes a difference in very many programs.
Good GC engines do indeed have the advantage that they can
defragment memory and make allocations more cache-friendly.
This is harder (but not impossible) to do in C/C++.
There has been some research on using a "mostly copying"
collector with C and C++. If I were developing a compiler,
that's the route I'd go. But it requires considerable
collaboration from the compiler.
On the other hand many "high-level languages" offer
abstractions at the expense of memory usage efficiency. There
are "high-level languages" where it's prohibitively difficult
to create very memory-efficient programs (which handle
enormous amounts of data) while still maintaining a fair
amount of abstraction and maintainability.
This seems to have been the trend during the past 10-20 years:
Completely disregard memory usage efficiency in favor of
higher level abstractions. After all, everybody has a
supercomputer on their desk and everything they do with it is
play tic-tac-toe. You don't need memory usage efficiency,
right?
Yes and no. In my case, most of the software I write runs on a
single machine, or on just a couple of machines; I'm not writing
shrink-wrapped software. And from a cost point of view, it's
several orders of magnitudes cheaper to configure them with
sufficient memory than it is for me to optimize the memory
footprint down to the last byte. I've known exceptions. One
case where three programmers spend two weaks just to gain 20
some bytes of code. So the program fit into a single ROM,
instead of requiring two. There were seven million examples of
the product built, and six man-weeks was less expensive than
seven million ROM's. But for most people, a couple of KB, or
even a couple of MB more or less aren't going to affect
anything.