Lawrence D'Oliveiro said:
And suddenly you’ve doubled the memory requirements. And on top of that,
since you’re moving the valid objects into different memory, you’re forcing
cache misses on all of them as well.
A minimal naive implementation indeed doubles the memory requirements,
but from a Python perspective where every integer takes something like
24 bytes already, even that doesn't seem so terrible. More
sophisticated implementations use multiple small heaps or other tricks.
It still costs something in memory footprint, but less than the minimal
implementation's 2x cost.
The new heap is filled sequentially so accesses to it will have good
locality. You do have to jump around within the old heap, but again,
with generational schemes, in the more frequent collections, the old
heap fits entirely in cache. For example, GHC's minor heap size is
256kB. For major collections, GHC switches (or used to) from copying to
a mark/compact scheme once the amount of live data in the heap goes over
some amount, giving the best of both worlds.
It's also the case that programs with very large memory consumption tend
to use most of the memory for large arrays that don't contain pointers
(think of a database server with a huge cache). That means the gc
doesn't really have to think about all that much of the memory.
This is the continuing problem with garbage collection: all the attempts to
make it cheaper just end up moving the costs somewhere else.
Same thing with manual allocation. That moves the costs off the
computer and onto the programmer. Not good, most of the time.
Really, I'm no gc expert, but the stuff you're saying about gc is quite
ill-informed. You might want to check out some current literature.