Tom Anderson said:
Yes, more or less.
Do you mean what python uses for cyclic garbage? If so, i hadn't realised
Yes, gc (a standard library module) gives you access to the mechanism
(to some reasonable extent).
that. There are algorithms for extending refcounting to cyclic structures
(i forget the details, but you sort of go round and experimentally
decrement an object's count and see it ends up with a negative count or
something), so i assumed python used one of those. Mind you, those are
probably more complex than mark-and-sweep!
Not sure about that, when you consider the "generational twists", but
maybe.
Ah. That would be why all those java, .net, LISP, smalltalk and assorted
other VMs out there, with decades of development, hojillions of dollars
and the serried ranks of some of the greatest figures in computer science
behind them all use reference counting rather than garbage collection,
then.
No, wait ...
Not everybody agrees that "practicality beats purity", which is one of
Python's principles. A strategy based on PURE reference counting just
cannot deal with cyclic garbage -- you'd also need the kind of kludges
you refer to above, or a twin-barreled system like Python's. A strategy
based on PURE mark-and-sweep *CAN* be complete and correct... at the
cost of horrid delays, of course, but what's such a practical
consideration to a real purist?-)
In practice, more has probably been written about garbage collection
implementations than about almost every issue in CS (apart from sorting
and searching;-). Good techniques need to be "incremental" -- the need
to "stop the world" for unbounded amounts of time (particularly in a
paged virtual memory world...), typical of pure m&s (even with
generational twists), is simply unacceptable in all but the most "batch"
type of computations, which occupy a steadily narrowing niche.
Reference counting is intrinsically "reasonably incremental"; the
worst-case of very long singly-linked lists (such that a dec-to-0 at the
head causes a cascade of N dec-to-0's all along) is as rare in Python as
it is frequent in LISP (and other languages that go crazy with such
lists -- Haskell, which defines *strings* as single linked lists of
characters, being a particularly egregious example) [[admittedly, the
techniques for amortizing the cost of such worst-cases are well known in
any case, though CPython has not implemented them]].
In any case, if you like Python (which is a LANGUAGE, after all) and
don't like one implementation of it, why not use a different
implementation, which uses a different virtual machine? Jython, for the
JVM, and IronPython, for MSCLR (presumably what you call ".net"), are
quite usable; project pypy is producing others (an implementation based
on Common LISP was one of the first practical results, over a year ago);
not to count Parrot, and other projects yet...
Reliability is a red herring - in the absence of ill-behaved native
extensions, and with correct implementations, both refcounting and GC are
perfectly reliable. And you can rely on the implementation being correct,
since any incorrectness will be detected very quickly!
Not necessarily: tiny memory leaks in supposedly "stable" versions of
the JVM, for example, which get magnified in servers operating for
extremely long times and on very large scales, keep turning up. So, you
can't count on subtle and complicated implementations of garbage
collection algorithms being correct, any more than you can count on that
for (for example) subtle and complicated optimizations -- corner cases
can be hidden everywhere.
There are two ways to try to make a software system reliable: make it so
simple that it obviously has no bugs, or make it so complicated that it
has no obvious bugs. RC is definitely tilted towards the first of the
two options (and so would be mark-and-sweep in the pure form, the one
where you may need to stop everything for a LONG time once in a while),
while more sophisticated GC schemes get more and more complicated.
BTW, RC _IS_ a form of GC, just like, say, MS is.
Lucky those existing C libraries were written to use python's refcounting!
Oh, you have to write a wrapper round the library to interface with the
automatic memory management? Well, as it happens, the stuff you need to do
is more or less identical for refcounting and GC - the extension has to
tell the VM which of the VM's objects it holds references to, so that the
VM knows that they aren't garbage.
Ah, but there is an obvious difference, when we're comparing reference
counting with mark and sweep in similarly simple incarnations: reference
counting has no "sweep" phase! M&S relies on any memory area not
otherwise accounted for being collectable during the "sweep" part, while
RC will intrinsically and happily leave alone any memory area it does
not know about. Adding sophistication to M&S often makes things even
more ticklish, if there are "random" pieces of memory which must be
hands-off -- the existing C library you're interfacing may give you no
idea, on an API-accessible level, where such internal "random" pieces
might be at any time. E.g., said existing libraries might be returning
to you "opaque handles" -- say you know they're pointers (already you're
having to breach the encapsulation and abstraction of the library you're
interfacing...), but pointers to WHAT? To structures which may
internally hold other pointers yet -- and what do THOSE point to...?
By ``handwaving'' about "the VM's objects" you imply a distinction
between such "objects" and other generic "areas of memory" that may not
be easy to maintain. In RC, no problem: the reference-counting
operations intrinsically discriminate (you don't addref or decref to
anything but such an "object"). In MS, the problem is definitely there;
your VM's allocator needs to be able to control the "sweep", which may
require quite a lot of extra overhead if it can't just assume it "owns"
all of the memory.
Doesn't look like it, does it?
Apparently not. Most of my "production-level" implementations of
garbage collection schemes hark back to the late '70s (as part of my
thesis) and early '80s (working at Texas Instruments to architect a
general purpose CPU with some kind of GC support in hardware); after
leaving the field, when I got back to it, years later, I essentially
found out that the universality of paged virtual memory had changed
every single parameter in the game. I did some work on
pointer-swizzling "incremental sort-of-compacting" collectors, and
conservative M&S a la Boehm, but by that time I was more interested in
real-world applications and none of those efforts ever yielded anything
practical and production-quality -- so, I either used existing libraries
and VMs (and more often than not cursed at them -- and, generally, the
FFIs whose gyrations I had to go through to work with them), or, when I
was implementing GC in my applications, relied on simple and solid
techniques such as variants on reference-counting, arenas, etc etc.
The crusher was a prototype application based on Microsoft's CLR (or
".net", as you call it) which needed to use MSCLR's ``advanced, modern,
sophisticated'' GC for many new parts, and "unmanaged" mode for a lot of
existing "legacy" libraries and subsystems. I won't say that it wasted
a year of my life, because I was leading that effort at about
half-time... so, it only wasted HALF a year of my life!-) Of course,
that was in the bad dark ages of about 5 years ago -- I'm sure that by
now everything is perfect and flawless and the experience of the
previous 25 years is thereby nullified, right?-)
From your tone I assume that your experience in implementing and
cooperating with modern, "advanced" GC techniques is much fresher and
more successful than mine. Personally, I'm just happy that other Python
developers must clearly have scars similar to mine in these respects, so
that Python's implementation is solid and conservative, one whose
correctness you CAN essentially count on.
A very, very sound principle. If you have the aforementioned decades,
hojillions and serried ranks, an all-tricks-turned-up-to-eleven strategy
can be made to work.
78% of software projects fail -- and I believe the rate of failures in
large IT departments and software houses is higher than average.
If you're a relatively small non-profit outfit like
the python dev team, minimising tricks buys you reliability and agility,
which is, really, what we all want.
And if you're a relatively large (and tumultuously growing),
pretty-good-profit outfit, minimizing tricks builds you scalability and
solidity. Funny enough, I've found that my attitude that "clarity,
solidity and prudence are THE prime criteria of good implementations",
born of thirty years' worth of scars and "arrows in my back", made me an
"instant cultural fit" for Google (which I joined as Uber Technical Lead
just over six months ago, and where I'm currently happily prospering)...
Alex