Garbage collection in C++

J

James Kanze

Yes, it's completely "fair" to compare a C++ program which
doesn't understand the concept of a smart pointer to lisp,
where "smart pointers" (in the sense that memory is
garbage-collected) are inherent.

It looks to me like he's comparing to programs which do the same
thing. Probably written by people competent in the language
being used.

I'm familiar with this quote myself. I think it is somewhat
dated; it is comparing C++ as it was written some years ago with
Lisp as it was implemented some years ago. In practice, C++ has
evolved, both with regards to the language, and with regards to
the way competent programmers use it. I don't know about Lisp,
but I do know that garbage collectors have also evolved, and
that modern garbage collection is considerably faster than most
implementations of malloc (which haven't evolved). In many use
patterns, anyway.
Yes: It requires more work to make that work in C++ with smart
pointers than the same thing in lisp (although making a
copy-on-write version of std::vector is not all that hard).

Making one that is multithread safe, and still has the desired
performance, is far from trivial. The experts who wrote the g++
library tried for std::string, and failed. (I'm obviously
ignoring the complexity issues; you can't use copy on write for
a fully conformant implementation of std::vector, because
something like operator[] could no longer be implemented in
constant time.)
However, that doesn't mean that the C++ version cannot be as
efficient as the lisp version. It only means the C++
programmers were incompetent.
Bullshit.
I still wouldn't trade modules (the most crucial part of OOP)
for GC, if I had to make an excluding choice.

I agree, in C++, and if we had any change of getting full
support for modules in C++, in this version, I'd be for it. But
there is no existing practice to base it on, which means that
modules are a lot more work, and involve considerable risk.
      »[A]llocation in modern JVMs is far faster than the best
      performing malloc implementations. The common code path
      for new Object() in HotSpot 1.4.2 and later is
      approximately 10 machine instructions (data provided by
      Sun; see Resources), whereas the best performing malloc
      implementations in C require on average between 60 and 100
      instructions per call (Detlefs, et. al.; see Resources).
I like how this misses one of the main reasons why memory
allocation can be slow: Cache behavior.

True. For good cache behavior, you need a copying garbage
collector, and I don't think that that's in the cards for C++.
From experience I would estimate that the number of
instructions executed when allocating/deallocating amounts to
maybe 10-20% of the total speed, and the remaining 80-90%
depends on how cache-friendly the allocation system is. Cache
misses are enormously expensive.

How much of a role they play depends largely on the application.
And while a copying collector can definitely have considerably
better locality than any manual allocator, I'm not sure that it
makes a difference in very many programs.
Good GC engines do indeed have the advantage that they can
defragment memory and make allocations more cache-friendly.
This is harder (but not impossible) to do in C/C++.

There has been some research on using a "mostly copying"
collector with C and C++. If I were developing a compiler,
that's the route I'd go. But it requires considerable
collaboration from the compiler.
On the other hand many "high-level languages" offer
abstractions at the expense of memory usage efficiency. There
are "high-level languages" where it's prohibitively difficult
to create very memory-efficient programs (which handle
enormous amounts of data) while still maintaining a fair
amount of abstraction and maintainability.
This seems to have been the trend during the past 10-20 years:
Completely disregard memory usage efficiency in favor of
higher level abstractions. After all, everybody has a
supercomputer on their desk and everything they do with it is
play tic-tac-toe. You don't need memory usage efficiency,
right?

Yes and no. In my case, most of the software I write runs on a
single machine, or on just a couple of machines; I'm not writing
shrink-wrapped software. And from a cost point of view, it's
several orders of magnitudes cheaper to configure them with
sufficient memory than it is for me to optimize the memory
footprint down to the last byte. I've known exceptions. One
case where three programmers spend two weaks just to gain 20
some bytes of code. So the program fit into a single ROM,
instead of requiring two. There were seven million examples of
the product built, and six man-weeks was less expensive than
seven million ROM's. But for most people, a couple of KB, or
even a couple of MB more or less aren't going to affect
anything.
 
J

James Kanze

Becomes the majority of programmers hired out there are
incompetent?

Because the majority of companies don't have good development
processes. I don't believe that most programmers are
incompetent.
A different programming language is not going to help that
problem.

Certainly not. There are no silver bullets. And a bad process
will produce bad programs, even with the best of tools.
 
J

James Kanze

[...]>http://www.joelonsoftware.com/articles/APIWar.html

     »[A]llocation in modern JVMs is far faster than the best
     performing malloc implementations.

This is hardcore MAJOR BULLSHI%! Who ever wrote that crap is
__TOTALLY__ ignorant; WOW!

More likely, he's done some actual measurements, rather than
just basing his opinion on prejudices. That corresponds more or
less with the actual measurements I've made at different times.
For programs with a lot of allocation of short lived objects,
garbage collection beats malloc/free hands-down, performance
wise.
 
J

James Kanze

It might be a fact in *your* case. I disagree on it being a
fact for everybody.

There are objective facts, which do hold for everyone, whether
you like it or not. And a simple tool like garbage collection
is simply not capable of "encouraging" (or "discouraging")
anything. You're committing anthromorphism, and giving garbage
collection a power that it just doesn't have.
 
J

James Kanze

  OO is "just a tool", too.

OO is more than just that.
  Doesn't it encourage specific ways to program?
  (Note that I do not necessarily defend Juha's position.
  I just find yours a very weak argument.)

Yes and no. It's certain that the programming language we use
does condition us to some degree. But garbage collection isn't
a programming language, nor a paradigm. It's just a very low
level tool. As such, it isn't capable of such an effect.
(Perhap Juha is confusing this with the effects of some
programming languages that happen to use garbage collection.
But C++ with garbage collection won't become Java, nor anything
like it. It will remain very much C++.)
 
M

Matthias Buelow

Juha said:
Becomes the majority of programmers hired out there are incompetent?

Well, you have identified the core problem, all right.
A different programming language is not going to help that problem.

A different memory management mechanism is not going to make much of a
difference here but it can assist the good programmers.
 
C

Chris M. Thomasson

[...]>http://www.joelonsoftware.com/articles/APIWar.html
»[A]llocation in modern JVMs is far faster than the best
performing malloc implementations.
[...]
This is hardcore MAJOR BULLSHI%! Who ever wrote that crap is
__TOTALLY__ ignorant; WOW!
More likely, he's done some actual measurements, rather than
just basing his opinion on prejudices. That corresponds more or
less with the actual measurements I've made at different times.
For programs with a lot of allocation of short lived objects,
garbage collection beats malloc/free hands-down, performance
wise.

Your acting as if GC totally devastates malloc/free; this is simply not
true. BTW, what makes you think I would actually allocate/deallocate single
objects with malloc/free? Here is pointer to simple region allocator which
is based on malloc/free:

http://groups.google.com/group/comp.lang.c++/browse_frm/thread/0419bf7caebf53ac



Here is a crude implementation of a scaleable general-purpose multi-threaded
region allocator which overloads global operator new/delete:

http://webpages.charter.net/appcore/vzoom/malloc/sergey_vzmem_thread.html



Last time I checked, JVM does not totally beat region allocation based on
malloc/free.




Perhaps I should code up a simple test application for this group. However,
I am not sure it would actually be on topic. What do you think?
 
C

Chris M. Thomasson

Chris M. Thomasson said:
[...]>http://www.joelonsoftware.com/articles/APIWar.html

»[A]llocation in modern JVMs is far faster than the best
performing malloc implementations.
[...]
This is hardcore MAJOR BULLSHI%! Who ever wrote that crap is
__TOTALLY__ ignorant; WOW!
More likely, he's done some actual measurements, rather than
just basing his opinion on prejudices. That corresponds more or
less with the actual measurements I've made at different times.
For programs with a lot of allocation of short lived objects,
garbage collection beats malloc/free hands-down, performance
wise.

Your acting as if GC totally devastates malloc/free; this is simply not
true. BTW, what makes you think I would actually allocate/deallocate
single objects with malloc/free? Here is pointer to simple region
allocator which is based on malloc/free:

http://groups.google.com/group/comp.lang.c++/browse_frm/thread/0419bf7caebf53ac



Here is a crude implementation of a scaleable general-purpose
multi-threaded region allocator which overloads global operator
new/delete:

http://webpages.charter.net/appcore/vzoom/malloc/sergey_vzmem_thread.html
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Sadly, this only complies with MSVC 8 or higher.

;^(...


Sorry about that.
 
C

Chris M. Thomasson

Chris M. Thomasson said:
Juha Nieminen said:
Chris said:
»[A]llocation in modern JVMs is far faster than the best
performing malloc implementations.
[...]

This is hardcore MAJOR BULLSHI%! Who ever wrote that crap is __TOTALLY__
ignorant; WOW!

Have you actually measured for yourself, or are you basing your
opinion on assumptions?

I have measured for myself. I have created multi-threaded allocators that
are just as fast, or faster, than recent JVM's.

The claim was that 10 or less machine instructions equal an allocation. My
question is WHAT instructions? Perhaps membars and/or atomic RMW? Wow, this
is omitted. Please elaborate!!?!?!?!
 
C

Chris M. Thomasson

Chris M. Thomasson said:
Juha Nieminen said:
Chris said:
»[A]llocation in modern JVMs is far faster than the best
performing malloc implementations.
[...]

This is hardcore MAJOR BULLSHI%! Who ever wrote that crap is __TOTALLY__
ignorant; WOW!

Have you actually measured for yourself, or are you basing your
opinion on assumptions?

I have measured for myself. I have created multi-threaded allocators that
are just as fast, or faster, than recent JVM's.

The claim was that 10 or less machine instructions equal an allocation. My
question is WHAT instructions? Perhaps membars and/or atomic RMW? Wow, this
is omitted. Please elaborate!!?!?!?!
 
K

Keith H Duggar

Woah. If the error is leaked memory, garbage collection may
correct it. Or it may not, depending on whether there is still
a pointer floating around to the memory. (The Java bugs data
base has more than a few cases of memory leaks in it.)

That's not what garbage collection is for. Garbage collection
isn't designed to make an incorrect program correct---I don't
think any tool can guarantee that.

And I never claimed GC does or was designed to correct errors.
One can see from the attached context it was you who posited
"mistakes will occasionally creap in" not I. If you did not
mean memory leaks what did you mean?
Garbage collection (like all
of the other tools I know) is designed to make it easier to
write a correct program. It also makes the effects of some
errors (dangling pointers) less critical.

And it's exactly by lessening the effects of some errors that
GC can actually /hide/ those errors. Perhaps I'm missing some
crucial point. Let me put a toy example together:

C++

Foo * x = new Foo() ;
//in a code far far away a reference is squirreled away
Foo * y = getX() ;
//time passes, we want x to never be used again
delete x ;
//in a code far far away the squirreled digs up his nut
y->activate()

Java

Foo x = new Foo() ;
//in a code far far away a reference is squirreled away
Foo y = getX() ;
//time passes, we want x to never be used again so what do
//you put here to indicate this? Roll your own "zombify"?
//in a code far far away the squirreled digs up his nut
y.activate()

In the C++ version, Purify (or similar) will catch the dangling
pointer or if it sneaks by (as you say "mistakes will creep in")
you have at least some a chance that the code cores and reveal
the error. In Java (and in GC in general?) you will never know.
What am I missing?
How should I respond to some wild and erroneous claim? In fact,
garbage collection helps to detect bugs; it is necessary in
order to effectively detect precisely the bug you describe.

Really? Please explain how GC helps rather than hinders in the
toy scenario I gave above.
Without garbage collection, it's undefined behavior; with
garbagea collection, it's a testable condition.

What does GC help you test exactly? Zombie access?
They also require additional work.

And they can have additional benefits.
Different "resources" have different constraints. Garbage
collection is fine for memory. RAII often (usually?) works well
for locks and such. You need explicit, programmer controlled
management for resources such as open files, where "release" can
fail. One size doesn't fit all.

That's why there is a large toolbox of deterministic tools.
As an aside, it seems that the simple constructor/destructor
paradigm has proven to be extremely flexible in implementing
a variety of resource management solutions. Do you agree?
You need to understand many different types of resource
management (and often transaction management in general---the
problem isn't just resources) if you want to write correct code.
Garbage collection doesn't dumb down the language, so that
idiots can use it. It just means that an intelligent programmer
has less lines of code to write. No more, no less.

Yes it does not "dumb down the language" but it is also not
free. Of course you must know that GC comes with various costs
so it's not worth going into them (yet again).
Exactly. You've mentionned a number of very useful tools.
Garbage collection is just one more to add to the list.
Sometimes, it will mean that you need to write less code.

Well, I agree with you! It is "just one more" tool and sometimes
it means you write less code. That said it does not come without
cost and those who require it less and value it less than you
are not stupid.
That's an oxymoron. If it was a memory management problem, then
garbage collection might have solved it (or it might not---if
you leave unused entries in an std::map, garbage collection
isn't going to remove them for you). If it was a flaw in logic
or design, then it's not a memory management problem.

The meaning there was: every problem revealed by memory related
diagnostics (leaks, faults, etc) was also ...
And if it
isn't detected by your development process (code review, tests,
etc.), with or without garbage collection, then you have a
serious problem in your development process.

Resources are finite and we are human and as you claimed yourself
"mistakes creep in". So what else is new?

Thanks for the continued discussion and your expertise!

KHD
 
K

Keith H Duggar

Apparently, then, you don't even know what garbage collection
does. Garbage collection manages memory. It does NOT manage
object lifetime. The two are different things (and the C++
standard very decisively keeps them separate).

I provided both the subthread and the keyword "zombie" so that
you could review some of the practical isses I'm referring to.
Did you review that thread? Or follow it at the time?
It's not a recent topic, and Andrei and I have discussed it in
newsgroups in the past, and unless his position has radically
changed, we more or less agree.

It's also not a resolved topic which you implied with your
emphatic "Attention!" statement.
But I'm not sure what your question is. It's obvious that in
this case, garbage collection is necessary to protect against
and detect using dangling pointers. The idiom works because
with garbage collection, memory management is decoupled from
object lifetime; because even though the object has ceased to
exist, the memory behind it cannot be reused as long as anyone
has a pointer to it.

And what does "ceased to exist" mean? If the object does not
"exist" then what is the pointer pointing to? How about we give
the "it" (even though "it" does not "exist") that the pointer
points to a name. Hmm ... I know, let's call it a "zombie". And
there begins an entire /unresolved/ debate that you can review
in the link I provided.

Do you know of a GC system/semantics that resolves the issues
raised in that thread? Can you elaborate on its semantic model?

KHD
 
C

Chris M. Thomasson

In other words, extra work. That's exactly what Matthias was
arguing against.
[...]


You apparently seemed to overlook my master point:




You can create a node cache in GC environment, but it requires one to
explicitly traverse the tree and put nodes into the cache. This can indeed
reduce the number of calls to new, simply because they can be replaced with
cache hits. Efficient programming in a GC world has "some" direct
similarities to the manual world indeed.




A node cache is extra work in a GC world, and a manual world; indeed its
beneficial in both, IMVHO at least! Reduce calls to new! Cool...




Any thoughts?

:^)
 
C

Chris M. Thomasson

[...]
So they seek for a security blanket called "garbage
collection", so they don't have to worry about it, and can
proceed to churn out their spaghetti code without worry,
just like they do in Java.
That is, of course, the most obvious bullshit I've ever
seen. There are cases where garbage collection isn't
appropriate, and cases where it is necessary.
Humm.. Can you give an example or two of a scenario in which a
GC is absolutely required/necessary?
Off hand, no, but I think Hans Boehm had one.

[...]



REALLY!????? Please educate me!




AFAICT, Hans Boehm never had any concrete example which proves that GC is
required/necessary simply because said example does _not_, dare I say
CANNOT, currently exist... I have indeed engaged in fairly limited
correspondence with Hans and the *famous Paul E. McKenney on cpp-threads
standardization list about some fairly advanced non-blocking algorihtms. He
never said anything about GC being a requirement for any algorithm. Please,
try and remember where Hans mentioned anything about GC being required for
something. Please, I always want to learn new things!

:^)






* -

Thing SCO + RCU patents! Too funny!
 
J

James Kanze

And I never claimed GC does or was designed to correct errors.
One can see from the attached context it was you who posited
"mistakes will occasionally creap in" not I. If you did not
mean memory leaks what did you mean?

Nothing in particular. Just that regardless of the technique
used, code written by human beings will contain errors; your
development process should be designed to detect and remove them
as far upstream as possible. (This in response to your claim
that garbage collection masks errors, where as in fact, it makes
the detection of some errors, like dangling pointers, possible.)
And it's exactly by lessening the effects of some errors that
GC can actually /hide/ those errors.

No. It is exactly by lessening the effects of those errors that
it makes their detection possible. (And also prevents them from
being used as a security hole---very important if your
connecting to the web.)
Perhaps I'm missing some crucial point. Let me put a toy
example together:

Foo * x = new Foo() ;
//in a code far far away a reference is squirreled away
Foo * y = getX() ;
//time passes, we want x to never be used again
delete x ;
//in a code far far away the squirreled digs up his nut
y->activate()

Foo x = new Foo() ;
//in a code far far away a reference is squirreled away
Foo y = getX() ;
//time passes, we want x to never be used again so what do
//you put here to indicate this? Roll your own "zombify"?
//in a code far far away the squirreled digs up his nut
y.activate()
In the C++ version, Purify (or similar) will catch the
dangling pointer or if it sneaks by (as you say "mistakes will
creep in") you have at least some a chance that the code cores
and reveal the error. In Java (and in GC in general?) you will
never know. What am I missing?

Purify will catch the error, but delivered code doesn't run
under Purify, so if the error doesn't show up in your test
cases, you're hosed without garbage collection; you have
undefined behavior, and while it might core dump. It might also
do anything else. Including (as has actually happened in one
case) allowing someone connected to your server to break into
your machine (and if the server is running as root, to do pretty
much anything it wants with root privileges). With garbage
collection, of course, there is no undefined behavior; you set
whatever bits you need to identify the error in the
deconstructed object, and you test them with each use of the
object, handling the detected error however you think best. (I
like assert for this, so I know I get the core dump.)

The problem is that in C++, when you deconstruct an object, you
also free the memory, and that memory can be reused for another
object, so you can't guarantee any state which would identify it
as having been deconstructed. When you deconstruct an object
and are using garbage collection, you can scribble all over the
object, overwriting it with values that can't possibly be legal
vptr's, and you can be moderately sure that those values won't
be overwritten as long as the ex-object is still accessible.

This is a case where garbage collection is necessary for maximum
robustness. But it obviously doesn't solve everything. You
can still dangle pointers to local objects, and a rogue pointer
can still overwrite anything. In the end, the real question is
how much undefined behavior can you accept; in my experience,
undefined behavior is a sure recepe for reduced robustness. And
garbage collection removes one (and regretfully only one)
potential source of undefined behavior.
Really? Please explain how GC helps rather than hinders in the
toy scenario I gave above.

Just did. It replaces undefined behavior with defined behavior.
Which you can define to do whatever is appropriate.
What does GC help you test exactly? Zombie access?

For zombies resulting from deconstruction. (Most of the time, I
think, zombie state is used to describe objects which you
weren't able to correctly construct to begin with. For those,
of course, exceptions provide the solution.)
And they can have additional benefits.

In certain cases, certainly. When they have enough additional
benefits to offest the additional cost, fine; the presence of
garbage collection doesn't prevent their use. Most of the time,
this isn't the case, however.
That's why there is a large toolbox of deterministic tools.

And that's why adding an additional tool fits into the model so
well.
As an aside, it seems that the simple constructor/destructor
paradigm has proven to be extremely flexible in implementing a
variety of resource management solutions. Do you agree?

More or less. The destructor paradigm certainly rates as one of
C++'s successes, and IMHO, beats finally hands down. Which
doesn't mean that finally wouldn't be nice as well. Nothing
wrong with having a choice. (I'd actually like to see a way of
creating "destructors" ad hoc. Something along the lines of:
cleanup { code } ;
, which would basically create an anonymous variable whose
destructor executes the code. Finally is nice when defining a
full blown class would be overly verbose, but I like the idea of
being able to write the finally code near the code which
provoked the need for it.)
Yes it does not "dumb down the language" but it is also not
free. Of course you must know that GC comes with various costs
so it's not worth going into them (yet again).

Well, I'm not sure what costs you're refering to. Most of the
argument against garbage collection seems to be that it will
turn programmers into idiots, which I don't buy. It obviously
means more work for the implementors, but in that respect, it is
nothing compared to two phase lookup for templates. And for the
user, it fits perfectly into the C++ requirement of "you don't
pay for it if you don't use it". While not really required, in
the strictest sense, I'd say that for the implementations I use,
you should count on doubling your heap usage; if your program is
at the limits, then it's something you can't afford to pay for,
regardless of any other advantages, but otherwise... (The issue
is actually more complex than that. The implementation I use
has provisions for allocating non-garbage collected memory as
well, so if you have a program which allocates a couple of very
large arrays of simple types, you can allocate them separately.
And the heap use that doubles is that of objects which are being
allocated and freed; if you allocate a couple of mega in objects
that are never freed, you don't have to double that.)
Well, I agree with you! It is "just one more" tool and
sometimes it means you write less code. That said it does not
come without cost and those who require it less and value it
less than you are not stupid.

There's a difference. Those who decide in a particular
application that it isn't appropriate aren't stupid. Those who
refuse to consider it, on the other hand, are certainly showing
unreasonable prejudice. As a professional, I have a
responsibility to my clients to provide the best service
possible at the lowest possible cost. Not using a tool which
would result in a more robust program at a lower price would be
a serious violation of professional ontology. And I can't know
whether the tool would result in a more robust program at a
lower price in any particular case unless I consider it with an
open mind.

[...]
Thanks for the continued discussion and your expertise!

I was about to say the same thing. (And don't take any harsh
statements I may have made at the beginning of the discussion
too literally. I like to exercise rhetoric litote, exagerating
a statement to bring a point home. I never mean it personally,
and I certainly don't think that everyone who doesn't see an
immediate need for garbage collection is stupid.)
 
J

James Kanze

A node cache is extra work in a GC world, and a manual world;
indeed its beneficial in both, IMVHO at least! Reduce calls to
new! Cool...
Any thoughts?

That it's an orthogonal problem. Implementing a node cache is
extra work (independantly of the memory management scheme used
otherwise). Additional cost, in other words. If the benefits
for a particular application justify the cost, you do it. If
they don't, you don't.
 
J

James Kanze

Well, you have identified the core problem, all right.

I disagree, and I would take exception to such a qualification.
I've working in computerprocessing for over thirty years, most
of them as a consultant, and I've seen and worked with literally
hundreds of programmers. In all that time, I've seen only one
who could be qualified as incompetent. One of the things I've
seen in this time is that people tend to respond according to
what is expected of them. If management considers the
programmers incompetent, they will produce work that corresponds
to how management considers them. If they are treated as
intelligent and responsible persons, it is very rare that they
don't respond as such.

Also, of course, competence isn't really a boolean quantity to
begin with; there are different degrees of competence, and also
different domains (the best designer may write horrible code).
And the real problem (often classified as "programmer
incompetence") is caused by management misusing the competence
they have. (Starting with the Peter principle. If someone is a
good programmer, the first thing that happens is that he gets
promoted to a position where he will never program again.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,163
Messages
2,570,897
Members
47,434
Latest member
TobiasLoan

Latest Threads

Top