Garbage collection in C++

J

James Kanze

"James Kanze" <[email protected]> wrote in message
On Nov 18, 8:23 pm, "Chris M. Thomasson" <[email protected]>
wrote:
[...]>http://www.joelonsoftware.com/articles/APIWar.html
»[A]llocation in modern JVMs is far faster than the best
performing malloc implementations.
[...]
This is hardcore MAJOR BULLSHI%! Who ever wrote that crap is
__TOTALLY__ ignorant; WOW!
More likely, he's done some actual measurements, rather than
just basing his opinion on prejudices. That corresponds
more or less with the actual measurements I've made at
different times. For programs with a lot of allocation of
short lived objects, garbage collection beats malloc/free
hands-down, performance wise.
Your acting as if GC totally devastates malloc/free; this is
simply not true.

No. I'm simply reporting actual measurments.
BTW, what makes you think I would actually allocate/deallocate
single objects with malloc/free?

And how is that relevant? Of course, you can optimize your code
in such a way as to work around the problem. The same thing
holds for garbage collection, of course, if you find yourself in
a configuration where it doesn't perform adequately. But that's
not really the point.
 
J

James Kanze

"James Kanze" <[email protected]> wrote in message
[...]
So they seek for a security blanket called "garbage
collection", so they don't have to worry about it, and can
proceed to churn out their spaghetti code without worry,
just like they do in Java.
That is, of course, the most obvious bullshit I've ever
seen. There are cases where garbage collection isn't
appropriate, and cases where it is necessary.
Humm.. Can you give an example or two of a scenario in which a
GC is absolutely required/necessary?
Off hand, no, but I think Hans Boehm had one.

REALLY!????? Please educate me!
AFAICT, Hans Boehm never had any concrete example which proves
that GC is required/necessary simply because said example does
_not_, dare I say CANNOT, currently exist...

Actually, in discussions elsethread, I myself mentionned one
case. Robustly protecting yourself against dangling pointer
errors.
I have indeed engaged in fairly limited correspondence with
Hans and the *famous Paul E. McKenney on cpp-threads
standardization list about some fairly advanced non-blocking
algorihtms. He never said anything about GC being a
requirement for any algorithm. Please, try and remember where
Hans mentioned anything about GC being required for something.
Please, I always want to learn new things!

I forget the details, since it didn't concern anything I was
involved it. I think it had something to do with his rope
class, in cases where the memory was partially cached, or
something like that, but I'm far from certain, and I may have
misunderstood him.
 
J

James Kanze

I provided both the subthread and the keyword "zombie" so that
you could review some of the practical isses I'm referring to.
Did you review that thread? Or follow it at the time?

I might have followed it at the time; I do remember some
signification threads on this subject (in which Andrei was a
contributor). I've not had time to read the complete thread
now, however. But unless it presents some radically new
information, I'm fairly aware of the issues.
It's also not a resolved topic which you implied with your
emphatic "Attention!" statement.

I think it is, actually. At least within the context of C++;
the standard makes a very clear distinction. I'm also aware of
people raising similar issues with regards to other languages,
but I don't know how widespread the issues have been discussed
or resolved within those languages; I've had some very concrete
discussions about this concerning C#, but the person involved
(Herb Sutter) comes from a C++ background, which may have
colored his understanding.

The notion of "lifetime of an object", as described in the C++
standard, is, IMHO, fundamental to good software engineering.
Regardless of whether it is embodied in the language or not.
(As far as I know, C++ is the only language which really makes
this distinction. Although perhaps Ada... offhand, it seems
very difficult to define a constructor and even more so a
destructor without it.) One important point with regards to
garbage collection is that it doesn't affect all objects; most
objects don't need an "active" destructor. The other important
point is that some objects do, and that doing anything with such
an object after its destructor has been called (or whatever
function is used for this; in some Java circles, at least,
dispose() seems to be the preferred name) is a programming
error.
And what does "ceased to exist" mean?

That the object has been "deconstructed". That it no longer
exists as an object of type T, in the sense that it is no longer
capable of behaving as a T should, and that using it as a T is a
programming error. In classical C++, the destructor has been
called, but of course, in other languages, some other convention
may be usual.
If the object does not "exist" then what is the pointer
pointing to?

That is a good question. Nothing really. Or raw memory.
Formally (i.e. according to the C++ language definition), it
doesn't matter, since dereferencing the pointer would result in
undefined behavior. (There is a special exception here for
reading the object as an array of bytes. It don't think it's
really relevant; an array of bytes is not the object, but just
the raw memory.) In practice, of course, as we all know, if a
pointer to something exists, even if that something no longer
exists, it will be used, sooner or later.
How about we give the "it" (even though "it" does not "exist")
that the pointer points to a name. Hmm ... I know, let's call
it a "zombie". And there begins an entire /unresolved/ debate
that you can review in the link I provided.

Let's call it raw memory. That's what the C++ standard does.
And regardless of what is in that discussion (I'll try and find
time to read it, but it looks very, very long), the issue is
resolved and closed as far as C++ is concnerned. After the
destructor has been called, but before the memory has been
freed, you do not have an object. You have raw memory. (Since
you're accessing it via a T*, it is sometimes convenient to call
it a T object with a special, zombie state, but this is really
misleading.)
Do you know of a GC system/semantics that resolves the issues
raised in that thread? Can you elaborate on its semantic
model?

Well, I'll have to read the thread to be sure, but the point of
garbage collection here is that with garbage collection, the raw
memory left after the deconstruction of an object will not
(normally) change its state as long as a pointer to that memory
exists anywhere in the program. Thus, if the object was
polymorphic, the destructor can "stomp" on the memory which
contained the vptr, and any attempt to call a virtual function
is guaranteed to result in a core dump. If the memory has
already been freed, however, it may have been reallocated, and
an object may have already been constructed in it, with,
perhaps, the vptr of the new object in exactly the same place.

My personal recommendation for robustness would be to clearly
analyse, at the design phase, which objects needed
deconstruction, and which didn't, without regards to memory
management. For the latter, once design has determined that
the destructor is trivial, i.e. that the object can be logically
used as long as it is accessible, just leave it to garbage
collection. This is probably the most frequent case, but most
of the objects so designated won't ever be allocated
dynamically, either. But there will be some. For those objects
which must be deconstructed, which have a determinate lifespan,
the deconstructing function must be called at a deterministic
moment. In C++, I do this by means of the delete operator,
using the destructor as the deconstructing function; when
garbage collection is present, I replace the global operator
delete by one which overwrites the memory with some easily
identifiable pattern, which (probably) can't be a pointer to
anything valid---0xDEADBEEF is a good choice for 32 bit
machines, I think. In Java, in such cases, I simply add a
boolean variable to the base class type, which the constructor
sets true, the destructor---sorry, the dispose() function---sets
false, and I test it at the top of each function. (You'll
notice that the C++ solution is a lot less work:).)
 
M

Matthias Buelow

Pete said:
I think the Java folks have come to the conclusion that finalization is
only useful for detecting failures to clean up non-memory resources. So
a class's finalizer throws an exception if the resources that the object
manages haven't been properly disposed of.

This is an interesting result, do you have a reference for this or is it
from observation?
 
N

Noah Roberts

Stefan said:
This reminds me of:

»Like Perl, C++ is a swiss army chainsaw of a programming
language. Unlike Perl, it's got all the blades
simultaneously and permanently cast in a fixed half-open
position. Don't turn it on.«
Whatever.

http://fare.livejournal.com/135078.html

My usual set of quotations regarding garbage collection
might be known to some readers:

»There were two versions of it, one in Lisp and one in
C++. The display subsystem of the Lisp version was faster.
There were various reasons, but an important one was GC:
the C++ code copied a lot of buffers because they got
passed around in fairly complex ways, so it could be quite
difficult to know when one could be deallocated. To avoid
that problem, the C++ programmers just copied. The Lisp
was GCed, so the Lisp programmers never had to worry about
it; they just passed the buffers around, which reduced
both memory use and CPU cycles spent copying.«

"Best of A is better than the worst of B...which just proves B is not as
good as A."
<[email protected]>

»A lot of us thought in the 1990s that the big battle would
be between procedural and object oriented programming, and
we thought that object oriented programming would provide
a big boost in programmer productivity. I thought that,
too. Some people still think that. It turns out we were
wrong. Object oriented programming is handy dandy, but
it's not really the productivity booster that was
promised. The real significant productivity advance we've
had in programming has been from languages which manage
memory for you automatically.«

I just don't buy it. When everyone was playing around with malloc'ed
arrays by hand all over the place that may have been a valid argument.
That's been absolutely proven to be a lack of imagination on the part of
those running into this problem though for the introduction of RAII and
other idioms such as scope guarding totally destroys this argument.
http://www.joelonsoftware.com/articles/APIWar.html

»[A]llocation in modern JVMs is far faster than the best
performing malloc implementations. The common code path
for new Object() in HotSpot 1.4.2 and later is
approximately 10 machine instructions (data provided by
Sun; see Resources), whereas the best performing malloc
implementations in C require on average between 60 and 100
instructions per call (Detlefs, et. al.; see Resources).
And allocation performance is not a trivial component of
overall performance -- benchmarks show that many
real-world C and C++ programs, such as Perl and
Ghostscript, spend 20 to 30 percent of their total
execution time in malloc and free -- far more than the
allocation and garbage collection overhead of a healthy
Java application (Zorn; see Resources).«

That's nice. Luckily for us we don't have to limit ourselves to malloc
and free. Check out the book Modern C++ Design to see an implementation
of a faster allocation pool for exactly the task claimed to be Java's
advantage above. The reason why Java would perform better here is quite
clear, it's got its own memory management heap and can optimize requests
to the OS....just like anyone else can when needed in C++.
http://www-128.ibm.com/developerworks/java/library/j-jtp09275.html?ca=dgr-jw22JavaUrbanLegends

»Perhaps the most important realisation I had while developing
this critique is that high level languages are more important
to programming than object-orientation. That is, languages
which have the attribute that they remove the burden of
bookkeeping from the programmer to enhance maintainability and
flexibility are more significant than languages which just
add object-oriented features. While C++ adds object-orientation
to C, it fails in the more important attribute of being high
level. This greatly diminishes any benefits of the
object-oriented paradigm.«

People that see C++ simply as an OO language are missing the entire
point. C++ is a multi-paradigm language that can be used as low or as
high as you can reach. No, it isn't a scripting language but personally
I don't think that's a bad thing. Having to query your types all the
time is NOT an advantage.
 
J

Juha Nieminen

Pete said:
I think the Java folks have come to the conclusion that finalization is
only useful for detecting failures to clean up non-memory resources. So
a class's finalizer throws an exception if the resources that the object
manages haven't been properly disposed of.

Personally I believe that this is caused by the limitations of Java
rather than being a good thing Java strives for.

If I'm not mistaken, C# offers a tool to alleviate that problem: The
'using' block, where destructors/finalizers can actually be very useful.

(Of course 'using' blocks can only be used locally and don't help when
the same resource is shared among different modules...)
 
J

Juha Nieminen

James said:
Yes and no. In my case, most of the software I write runs on a
single machine, or on just a couple of machines; I'm not writing
shrink-wrapped software. And from a cost point of view, it's
several orders of magnitudes cheaper to configure them with
sufficient memory than it is for me to optimize the memory
footprint down to the last byte.

I have worked in a project where the amount of calculations performed
by a set of programs was directly limited by the amount of available
memory. (If the program started to swap, it was just hopeless to try to
wait for it to finish.)

Reducing the memory footprint of the program to half meant that
simulations twice as large could be performed. Given that these programs
were used by several people in their own computers, I would say that the
total benefit of squeezing out the last unused bits was worth it.

And of course it also means that (after the optimization) if someone
doubled the amount of RAM in their computer, they could run simulations
four times larger than before, rather than just two. That is an enormous
benefit.
 
J

Juha Nieminen

James said:
There are objective facts, which do hold for everyone, whether
you like it or not. And a simple tool like garbage collection
is simply not capable of "encouraging" (or "discouraging")
anything. You're committing anthromorphism, and giving garbage
collection a power that it just doesn't have.

Would you say that lisp encourages people to write programs using a
functional style? Would you say that C encourages people to write with
an imperative style?
 
D

Dilip

  Would you say that lisp encourages people to write programs using a
functional style? Would you say that C encourages people to write with
an imperative style?


Thats like asking if a butcher's knife can be used to stab someone.
An object can have several intrinsic qualities. What you want to do
with it is your problem. C++ is a multi-paradigm language as has been
pointed out a million different times. How you want to approach a
software engineering problem is dependant on what tools you use within
the umbrella of the language to solve it. STL, RAII all offer a
certain way to approach a problem. Garbage collection simply adds
another dimension to it. James has been explaining this several
different ways and it just doesn't seem to click. Instead we get into
absolutes like: "Only morons use GC" or "GC encourages anti-modular
programming" or "GC refused to make coffee for me this morning...".

I am not appealing to authority here but instead of talking in
abstract like "GC will lead to this and that..." could we at least
recognize the fact that James has actually used Boehm's collector in
his projects and found real benefits? Can we at least admit practice
trumps theory?
 
D

Dilip

  Personally I believe that this is caused by the limitations of Java
rather than being a good thing Java strives for.

At least in the .NET world one is heavily discouraged from writing
finalizers. As was pointed out, it is necessary only when a class has
to dispose of its "unmanaged" resources in a timely manner. Of
course, that still depends on that the clients of the class
remembering to use that dreaded 'using' block. Also, throwing an
exception from a finalizer is useless because the runtime (CLR) will
simply ignore it.
  If I'm not mistaken, C# offers a tool to alleviate that problem: The
'using' block, where destructors/finalizers can actually be very useful.

Its a little more complicated than that. C# doesn't have a concept of
destructors however C++/CLI (architected after Sutter came onboard)
retains the dtors as it works in the regular C++ world while also
adding mechanisms to implement finalizers (implemented by using the
'!' mark just as we use '~' to represent dtors)
  (Of course 'using' blocks can only be used locally and don't help when
the same resource is shared among different modules...)

Not sure what this means but we are straying OT.. so lets forget it.
 
M

Matthias Buelow

Juha said:
I have worked in a project where the amount of calculations performed
by a set of programs was directly limited by the amount of available
memory. (If the program started to swap, it was just hopeless to try to
wait for it to finish.)

You seem to think that a program that uses GC uses more memory, which is
just false in the general case. Why should it use more? The amount of
memory a program is using depends on the program logic and its memory
trace and not on how the memory is managed, which is, as I said, an
uninteresting implementation detail.
 
J

Jean-Marc Bourguet

Matthias Buelow said:
You seem to think that a program that uses GC uses more memory, which is
just false in the general case. Why should it use more? The amount of
memory a program is using depends on the program logic and its memory
trace and not on how the memory is managed, which is, as I said, an
uninteresting implementation detail.


GC tend to trade off space for speed (I'm probably not up to date about GC
algorithms, but ISTR that the cost of a GC scan is proportional to the
amount of live memory, so systems like generational GC scan less memory at
the risk of keeping some unreachable one longer).

GC use reachability as a safe approximation to liveness, but that is only
an approximation.

GC users tend not to NULL uneeded references (helping to keep the
approximation good) when there is no risk of the equivalent of a memory
leak by accumulation of reachable memory which will never be used. Some
seems even to think that the last possibility doesn't exist as "the GC will
take care of the dead memory", wrong, only of the unreachable one. OTHO
I've seen lisp programs really go out of their way to "cons" less, yes lisp
programs which managed the memory manually (at least in critical loop).

Yours,
 
C

Chris M. Thomasson

Matthias Buelow said:
You seem to think that a program that uses GC uses more memory, which is
just false in the general case.

Well, memory only gets reclaimed when the GC "decides" to run a scan. So, if
the program is making frequent allocations, and the GC does not run enough
scans to keep up, it has no choice to keep expanding its internal heaps.


// GC world
struct foo {
void do_something();
};


void thread() {
foo* f;
for (;;) {
f = new foo();
f->do_something();
}
}


int main() {
// create 32 threads
reutrn 0;
}




// Manual world
struct foo {
void do_something();
};


void thread() {
foo* f;
for (;;) {
f = new foo();
f->do_something();
delete f;
}
}


int main() {
// create 32 threads
reutrn 0;
}



Why should it use more? The amount of
memory a program is using depends on the program logic and its memory
trace and not on how the memory is managed, which is, as I said, an
uninteresting implementation detail.

Which program is going to use more memory? The Manual world, there can only
ever be up to 32 foo objects at a time. In the GC worlds, well, there can
potentially be hundreds, or thousands, in between GC scan intervals...
 
C

Chris M. Thomasson

Chris M. Thomasson said:
Well, memory only gets reclaimed when the GC "decides" to run a scan. So,
if the program is making frequent allocations, and the GC does not run
enough scans to keep up, it has no choice to keep expanding its internal
heaps.
[...]

Here is a pseudo-code example that should really make a GC environment hog
memory:



struct node {
node* m_next;

void do_something_read_only() const;
};


static mutex g_lock;
static node* g_nodes = NULL;


void writer_thread() {
for (unsigned i = 1 ;; ++i) {
if (i % 10000) {
node* n = new node();
mutex::guard lock(g_lock);
n->m_next = g_nodes;
// membar #StoreStore
g_nodes = n;
} else {
g_nodes = NULL;
}
}
}


void reader_thread() {
for (;;) {
node* n = g_nodes;
// data-dependant load barrier
while (n) {
n->do_something_read_only();
n = n->m_next;
// data-dependant load barrier
}
}
}


int main() {
// create 10 writer threads
// create 32 reader threads
return 0;
}
 
D

Dilip

Well, memory only gets reclaimed when the GC "decides" to run a scan. So, if
the program is making frequent allocations, and the GC does not run enough
scans to keep up, it has no choice to keep expanding its internal heaps.

You make it sound like its a random occurance? GC runs when the
"managed" heap is full. Even then the heap is split into generations
and a *complete* reclamation isn't done all the time. If you make
frequent allocations, sooner or later you will run out of allocatable
memory and GC will then kick in. One common theme through out this
thread is from people like Juha who claim that this will simply
encourage programmers to allocate willy-nilly w/o bothering about
performance of the application. Well, if you can find that kind of
cowboy programmer, chances are he/she is going to be equally idiotic
about some other aspect of the language. I just don't understand why
or how GC is breaking new ground here.
// GC world
struct foo {
  void do_something();

};

void thread() {
  foo* f;
  for (;;) {
    f = new foo();
    f->do_something();
  }

}

int main() {
  // create 32 threads
  reutrn 0;

}

// Manual world
struct foo {
  void do_something();

};

void thread() {
  foo* f;
  for (;;) {
    f = new foo();
    f->do_something();
    delete f;
  }

}

int main() {
  // create 32 threads
  reutrn 0;

}

Which program is going to use more memory? The Manual world, there can only
ever be up to 32 foo objects at a time. In the GC worlds, well, there can
potentially be hundreds, or thousands, in between GC scan intervals...

This is like arguing that templates provide compile-time
polymorphism. Of course they do!! Thats what they are for. In the
manual world memory management is the programmers' responsibility, in
the GC world you transfer it to the tool. The quantity of memory your
application needs is your application's own business. GC simply
_manages_ how much you create. If you create more its more work for
GC, if you create less its less work for GC. I am left wondering if
you are committing a non-sequitur or a tautology.
 
K

Keith H Duggar

And I never claimed GC does or was designed to correct errors.
One can see from the attached context it was you who posited
"mistakes will occasionally creap in" not I. If you did not
mean memory leaks what did you mean?

Nothing in particular. Just that regardless of the technique
used, code written by human beings will contain errors; your
development process should be designed to detect and remove them
as far upstream as possible.
Agreed.

([GC] also prevents them from being used as a security hole
--- very important if your connecting to the web.)

True. Point taken.
(This in response to your claim
that garbage collection masks errors, where as in fact, it makes
the detection of some errors, like dangling pointers, possible.)
And it's exactly by lessening the effects of some errors that
GC can actually /hide/ those errors.

No. It is exactly by lessening the effects of those errors that
it makes their detection possible.
[snip]

Purify will catch the error, but delivered code doesn't run
under Purify, so if the error doesn't show up in your test
cases, you're hosed without garbage collection; you have
undefined behavior, and while it might core dump. It might also
do anything else. Including (as has actually happened in one
case) allowing someone connected to your server to break into
your machine (and if the server is running as root, to do pretty
much anything it wants with root privileges). With garbage
collection, of course, there is no undefined behavior; you set
whatever bits you need to identify the error in the
deconstructed object, and you test them with each use of the
object, handling the detected error however you think best. (I
like assert for this, so I know I get the core dump.)

The problem is that in C++, when you deconstruct an object, you
also free the memory, and that memory can be reused for another
object, so you can't guarantee any state which would identify it
as having been deconstructed. When you deconstruct an object
and are using garbage collection, you can scribble all over the
object, overwriting it with values that can't possibly be legal
vptr's, and you can be moderately sure that those values won't
be overwritten as long as the ex-object is still accessible.

This is a case where garbage collection is necessary for maximum
robustness. But it obviously doesn't solve everything. You
can still dangle pointers to local objects, and a rogue pointer
can still overwrite anything. In the end, the real question is
how much undefined behavior can you accept; in my experience,
undefined behavior is a sure recepe for reduced robustness. And
garbage collection removes one (and regretfully only one)
potential source of undefined behavior.

Thank you. I understand your point much more clearly now. And
as far as I can see you are right. GC does enable more advanced
error detection.

That said, "[setting] whatever bits you need" and "testing them
with each use of the object" was discussed with respect to
zombies in the referenced thread. The concern is that such bits
and checks cost both space and time (at least for non-virtual
functions and member variable access). Do you agree this is
a real cost and a legitimate concern? Or is there a clever
way around those costs?
More or less. The destructor paradigm certainly rates as one of
C++'s successes, and IMHO, beats finally hands down. Which
doesn't mean that finally wouldn't be nice as well. Nothing
wrong with having a choice. (I'd actually like to see a way of
creating "destructors" ad hoc. Something along the lines of:
cleanup { code } ;
, which would basically create an anonymous variable whose
destructor executes the code.

Indeed. Anonymous destructors would provide for cleaner syntax.
However, since they are already supported with more verbosity

void foo ( ) {
struct anon {
~anon ( ) {
std::cout << "(zombie) : brains! brains!\n" ;
}
} a ;
}

perhaps many would considered them to be only "sugar". On the
other hand lambda expressions are in this sense also "sugar"
and they were accepted into C++0x.
There's a difference. Those who decide in a particular
application that it isn't appropriate aren't stupid. Those who
refuse to consider it, on the other hand, are certainly showing
unreasonable prejudice. As a professional, I have a
responsibility to my clients to provide the best service
possible at the lowest possible cost. Not using a tool which
would result in a more robust program at a lower price would be
a serious violation of professional ontology. And I can't know
whether the tool would result in a more robust program at a
lower price in any particular case unless I consider it with an
open mind.

Agreed on all points. Well said.
I was about to say the same thing. (And don't take any harsh
statements I may have made at the beginning of the discussion
too literally. I like to exercise rhetoric litote, exagerating
a statement to bring a point home. I never mean it personally,
and I certainly don't think that everyone who doesn't see an
immediate need for garbage collection is stupid.)

I know very well that you are highly skilled and informed and was
thus sure you had something to say worth hearing; and I wanted to
hear it! I wasn't about to let harsh rhetoric distract me from
that goal ;-)

KHD
 
J

James Kanze

On Nov 19, 5:31 am, James Kanze <[email protected]> wrote:

[...]
Thank you. I understand your point much more clearly now. And
as far as I can see you are right. GC does enable more
advanced error detection.
That said, "[setting] whatever bits you need" and "testing
them with each use of the object" was discussed with respect
to zombies in the referenced thread. The concern is that such
bits and checks cost both space and time (at least for
non-virtual functions and member variable access). Do you
agree this is a real cost and a legitimate concern? Or is
there a clever way around those costs?

As they say, there's no such thing as a free lunch. In most
specific cases, I think you will be able to come up with some
solution which doesn't entail any extra space associated with
the object; classes with no redundancy and no extra bits are
rare. And often (but certainly not always) you should be able
to trick hardware into making the check (set a pointer to an
invalid value, for example). But even in these cases, you have
the extra space overhead associated with garbage collection, and
you have created extra work for the programmer. If the compiler
were to take charge of this, the extra space associated with the
object would be systematic, but it could presumably somehow be
integrated into the overall overhead of memory management. (In
all of the manual memory management schemes I've seen, you have
at least one extra pointer per allocated block. Which must be a
multiple of 4 or 8, depending on the machine, so you have 2 or 3
free bits to play with. If you're the compiler or the library
implementer; you can't reasonably get at them from user code.)
On the other hand, unless the compiler was very, very smart (and
I'm not aware of any research having been done on ways to
optimize this), you'd have a systematic runtime overhead.

In practice, in all of the cases I'm aware of where a dangling
pointer was used to breech security, the technique used was to
cause the vptr to be overwritten in a way which caused the
malicious code to be executed when a virtual function was called
through the dangling pointer. I'm not aware of a security
problem which doesn't involve pointers to functions somewhere,
so just zapping all of the memory with values that would be
invalid as pointers (and ensuring that it doesn't get reused as
long as there is a pointer to it) would be sufficient. If the
goal is larger; to detect as many possible programming errors as
possible, as soon as possible, then more effort would be
involved.
Indeed. Anonymous destructors would provide for cleaner
syntax. However, since they are already supported with more
verbosity
void foo ( ) {
struct anon {
~anon ( ) {
std::cout << "(zombie) : brains! brains!\n" ;
}
} a ;
}
perhaps many would considered them to be only "sugar". On the
other hand lambda expressions are in this sense also "sugar"
and they were accepted into C++0x.

Exactly. The current support becomes a lot more awkward if the
cleanup code needs to access local variables.

Time constraints have meant that I haven't been active in the
lambda proposal that was adopted. I regret this, because I had
definite ideas about it---IMHO, at the base, what we need is
anonymous classes (or lambda classes, the name isn't important);
constructs which, conceptually, implicitly define a class with
protected references to all of the accessible local variables
and create an instance of it with all of the references
initialized. (Obviously, I expect the compiler to "optimize"
the references out in the generated code.) A lambda function
would be nothing more than a functional object which derived
from such a class; a lambda expression would wrap the expression
in a lambda function, then generate the object. Cleanup, above
would be derived from the class as well, except that the code
would define the destructor, rather than an operator()(). And
I'd expect the class itself to be available, so that the
programmer could derive from it and add additional state, if he
wanted.

I've not studied the lambda proposition in detail, so I don't
know how much of the above it might incorporate. A quick
glance, however, does show that it involves a "closure object",
which sounds very much like the instance of my anonymous class
(although described in somewhat different language). So it
shouldn't be too hard to add "cleanup" as an extension, or in a
later version of the standard, if someone can find time to write
up a proposal for it.
 
J

James Kanze

James said:
C++
Foo * x = new Foo() ;
//in a code far far away a reference is squirreled away
Foo * y = getX() ;
//time passes, we want x to never be used again
delete x ;
//in a code far far away the squirreled digs up his nut
y->activate()
Java
Foo x = new Foo() ;
//in a code far far away a reference is squirreled away
Foo y = getX() ;
//time passes, we want x to never be used again so what do
//you put here to indicate this? Roll your own "zombify"?
//in a code far far away the squirreled digs up his nut
y.activate()
In the C++ version, Purify (or similar) will catch the
dangling pointer or if it sneaks by (as you say "mistakes
will creep in") you have at least some a chance that the
code cores and reveal the error. In Java (and in GC in
general?) you will never know. What am I missing?
Purify will catch the error, but delivered code doesn't run
under Purify, so if the error doesn't show up in your test
cases, you're hosed without garbage collection; [...]
I don't think this can be discussed that generally. It
might just be that accessing the object at this time
might do something blatantly stupid and by having GC
allowing it, instead of the app core dumping it might
be much worse.

The problem is that in real life, the application didn't core
dump. The memory was reallocated as a buffer, where user input
was written. And the user designed his input so that it
corresponded to a vptr which pointed to malicious code, and
breached security when the dangling pointer was used.

With garbage collection, the "destructor" sets the vptr to an
invalid pointer. And since the memory can't be reallocated as
long as it is reachable, the invalid pointer stays set, and the
crash is guaranteed (which is what you want).

What it comes down to is that we're replacing undefined behavior
with defined. You may not like what the defined behavior is,
out of the box, but you can intervene to make it whatever you
want. Where as undefined behavior is, well, undefined.

[...]
OTOH, there is the argument that GC only deals with one
resource (although admittedly the one that's probably most
common), but doesn't do anything to help you with all the
others.

I'll admit that I don't understand this argument. Obviously,
garbage collection deals with only one resource. But you need
different solutions for different resources; what makes garbage
collection useful is that it deals transparently with the only
resource nine tenths of your classes are concerned with. So you
have less work to do.
 
J

James Kanze

Personally I believe that this is caused by the limitations
of Java rather than being a good thing Java strives for.
If I'm not mistaken, C# offers a tool to alleviate that
problem: The 'using' block, where destructors/finalizers can
actually be very useful.
[/QUOTE]
Well, sure, if you change the context, the answer changes. We
were talking about finalizers and garbage collection, not
about finalizers and not garbage collection.

There's a general problem of vocabulary, I think. In the
somewhat distant past, I've heard people assimulate finalizers
and C++ destructors; more recent articles do stress the
differences. If I understand what Juha is describing (and what
Herb Sutter has described to me in the past), what C# is
offering is still a third possibility, somewhat closer to a
finally clause than to anything else. Calling it finalization
certainlly lends to confusion, but it isn't the first time that
we've had different concepts hiding under the same name.
 
J

James Kanze

James said:
Juha Nieminen wrote:
Becomes the majority of programmers hired out there are incompetent?
Well, you have identified the core problem, all right.
I disagree, and I would take exception to such a
qualification. I've working in computerprocessing for over
thirty years, most of them as a consultant, and I've seen
and worked with literally hundreds of programmers. In all
that time, I've seen only one who could be qualified as
incompetent. [...]
That just means you've mostly been assigned to one kind of
shops. Good for you. Go visit some I have seen.

See my comments else-thread. Very, very few programmers are
really incompetent. Very, very few are gifted enough to be able
to overcome mismanagement, however, and quite a few are
mismanaged in ways that prevent their competence from being
used, or even seen.

Also, there are vastly different competences. I consider myself
a competent programmer, at least in procedural and OO languages
(C++ and Java, but I'm sure that I would have no problem with C#
or even Smalltalk), and quite good at many aspects of low-level
design, but I think you'd be disappointed if you asked me to do
system architecture or write a user manual.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,163
Messages
2,570,897
Members
47,434
Latest member
TobiasLoan

Latest Threads

Top