[ ... ]
Jerry
Can I invite you to take a look at this post[1] back from a few years
ago where a MSFT employee laid out his thoughts on why they went the
GC way w/o bothering about refcounting?
Sure. It'll take a while to read the whole thing, so I can't comment on
it in any detail just yet, but my immediate reaction based on glancing
it over is that it appears that their starting point was primarily the
semantics of VB6.
If that's accurate, then it appears there's not really a huge amount to
say: VB-6 is quite a lot different from C++. Specifically, C++ allows
one to execute arbitrary code in a destructor, meaning that the
destruction of an object can (and often does) have implications far
above and beyond that of simply destroying the object itself. In many
cases, correct operation of the program depends completely upon
execution of that code at precisely the correct time.
In VB-6 (or earlier) I don't believe that's the case. At least offhand,
I can't think of any observable semantics associated with destroying
anything (though I've never used VB much, so that may easily be wrong).
If that's the case, changing the timing of destroying objects can't make
an observable difference either. That makes almost any sort of garbage
collection comparatively trivial to incorporate.
In a situation like that, your choice of garbage collection techniques
comes down primarily to trade-offs between memory usage and speed, with
(quite) a bit of tuning to support the specific patterns of object usage
you see. By the time MS was doing this work, people had been working on
garbage collectors for years, and writing something that worked at least
reasonably well was a matter of combining well known elements in
relatively well known ways to get what were probably about the expected
results.
Also shortly after the post
was written, MSFT funded a project to try to add ref-counting to their
Rotor (their version of open sourced CLR) codebase[2] as a kind of
feasibility study and it failed miserably. The latter project, I have
the details only as a word document and have uploaded it here[2]. Let
me know what you think.
[1]
http://discuss.develop.com/archives/wa.exe?A2=ind0010A&L=DOTNET&D=0&P=39459
[2]
http://cid-ff220db24954ce1d.skydrive.live.com/browse.aspx/RefcountingToRotor
This second paper is a bit harder to comment on. They openly admit that
they 1) made no attempt at optimization, and 2) didn't profile the code
(tried to, but failed).
They make the case (correct, as a rule, I think) that deterministic
finalization allows one to simplify client code. The fact that their
code was slow in the absence of optimization, and that they don't (or at
that time didn't) know exactly why or how it was so slow, only points to
what we don't really know.
At least at first glance, their implementation sounds pretty slow,
though some points aren't entirely clear. They mention calling AddRef
and Release, but it's not clear whether they mean they're literally
calling member functions, or just using those names to make the idea
clear to people accustomed to COM.
If they were really using COM-style member functions, each increment or
decrement would involve loading the object's vtable pointer, then
looking up the function in the vtable, then calling that function to
increment or decrement. We're up to something like three memory
references, of which only one (the vtable pointer) is at all likely to
be in the cache already.
Then they talk about having to store the count for each object at the
end of the object, with a potentially different offset for each type of
object. That tends to indicate that they'd have to store the proper
offset and look it up for each object. If, heaven forbid, they followed
standard COM practice, they'd have to call _another_ function out of the
vtable to get that offset. If they access the data directly, we're
looking at a couple more memory references, and if they called a COM-
style member function we'd be looking at another three memory references
or so, of which (again) only one would be at all likely to be in the
cache already.
All in all, what would normally be a single reference to memory that
would almost always be in the cache would turn into around four to six
references to memory, only one or two of which would be in the cache on
more than rare ocassion.
Reference counting requires only simple operations (incrementing or
decrementing) but it does those quite frequently. You only ever get away
with it at all because those are normally quite fast. When you add
something like 4 references to memory that's not likely to be cached to
every one of those operations, dismal performance sounds like the best
you'd expect.
I should also point out that at no point have I attempted to advocate
reference counting as a general purpose garbage collection solution, or
anything on that order. Quite the contrary, I think if somebody were to
attempt to implement Java, Lisp, Smalltalk, Self, OCaml, etc., using
reference counting as it's only method of garbage collection, they'd be
making an extremely unwise decision at best.
My advocacy in this matter is purely for at least a reasonable degree of
accuracy. My assumption is that people considering various forms of
garbage collection should already know something about their memory
usage patterns. Matching that up with the characteristics of various
forms of garbage collection will allow them to make an intelligent
choice.
I have no problem at all if that choice happens to be something other
than reference counting -- in fact, I'd go so far as to say that
reference counting is only a good choice in relatively limited
circumstances. At the same time, within those limited circumstances,
reference counting will typically be a substantially better choice than
most alternatives.
If there was _never_ a situation in which reference counting was useful,
I wouldn't worry much if advice against it was based on inaccurate
reasoning -- even if inaccurate, it wouldn't be particularly misleading.
That's not the situation here. Jon Harrop's statements are both wildly
inaccurate, AND grossly misleading. IMO, that's unacceptable, and his
false statements _need_ to be corrected -- not in any particular hope
that anybody in particular will choose to use reference counting, but
only in the hope that they can make an intelligent, well-informed
decision based on facts, not blatant falsehoods.