J
jacob navia
Abstract
--------
Garbage collection is a method of managing memory by using a "collector"
library. Periodically, or triggered by an allocation request, the
collector looks for unused memory chunks and recycles them.
This memory allocation strategy has been adapted to C (and C++) by the
library written by Hans J Boehm and Alan J Demers.
Why a Garbage Collector?
-----------------------
Standard C knows only the malloc/calloc/free functions. The programmer
must manage each block of memory it allocates, never forgetting to call
the standard function free() for each block. Any error is immediately
fatal, but helas, not with immediate consequences. Many errors like
freeing a block twice (or more) or forgetting to free an allocated
block will be discovered much later (if at all). This type of bugs are
very difficult to find and a whole industry of software packages
exists just to find this type of bugs.
The garbage collector presents a viable alternative to the traditional
malloc/free "manual" allocation strategies. The allocator of Boehm
tries to find unused memory when either an allocation request is
done, or when explicitely invoked by the programmer.
The main advantage of a garbage collector is that the programmer is
freed from the responsability of allocating/deallocating memory. The
programmer requests memory to the GC, and then the rest is *automatic*.
Limitations of the GC.
---------------------
The GC needs to see all pointers in a program. Since it scans
periodically memory, it will assume that any block in its block list is
free to reuse when it can't find any pointers to it. This means that the
programmer can't store pointers in the disk, or in the "windows extra
bytes", as it was customary to do under older windows versions, or
elsewhere.
This is actually not a limitation since most programs do not write
pointers to disk, and expect them to be valid later...
Obviously, there is an infinite way to hide pointers (by XORing them
with some constant for instance) to hide them from the collector.
This is of no practical significance. Pointers aren't XORed in normal
programs, and if you stay within the normal alignment requirements
of the processor, everything works without any problems.
Performance considerations
--------------------------
In modern workstations, the time needed to make a complete sweep in
mid-size projects is very small, measured in some milliseconds. In
programs that are not real time the GC time is completely undetectable.
I have used Boehm's GC in the IDE of lcc-win32, specially in the
debugger. Each string I show in the "automatic" window is allocated
using the GC. In slow machines you can sometimes see a pause of
less than a second, completely undetectable unless you know that is
there and try to find it.
It must be said too that the malloc/free system is slow too, since at
each allocation request malloc must go through the list of free blocks
trying to find a free one. Memory must be consolidated too, to avoid
fragmentation, and a malloc call can become very expensive, depending
on the implementation and the allocation pattern done by the program.
Portability
-----------
Boehm's GC runs under most standard PC and UNIX/Linux platforms. The
collector should work on Linux, *BSD, recent Windows versions, MacOS X,
HP/UX, Solaris, Tru64, Irix and a few other operating systems. Some
ports are more polished than others. There are instructions for porting
the collector to a new platform. Kenjiro Taura, Toshio Endo, and Akinori
Yonezawa have made available a parallel collector.
Conclusions
-----------
The GC is a good alternative to traditional allocation strategies for C
(and C++). The main weakness of the malloc/free system is that it
doesn't scale. It is impossible to be good at doing a mind numbing task
without any error 100% of the time. You can be good at it, you can be
bad at it, but you can NEVER be perfect. It is human nature.
The GC frees you from those problems, and allows you to conecntrate in
the problems that really matter, and where you can show your strength
as software designer. It frees you from the boring task of keeping track
of each memory block you allocate.
jacob
--------
Garbage collection is a method of managing memory by using a "collector"
library. Periodically, or triggered by an allocation request, the
collector looks for unused memory chunks and recycles them.
This memory allocation strategy has been adapted to C (and C++) by the
library written by Hans J Boehm and Alan J Demers.
Why a Garbage Collector?
-----------------------
Standard C knows only the malloc/calloc/free functions. The programmer
must manage each block of memory it allocates, never forgetting to call
the standard function free() for each block. Any error is immediately
fatal, but helas, not with immediate consequences. Many errors like
freeing a block twice (or more) or forgetting to free an allocated
block will be discovered much later (if at all). This type of bugs are
very difficult to find and a whole industry of software packages
exists just to find this type of bugs.
The garbage collector presents a viable alternative to the traditional
malloc/free "manual" allocation strategies. The allocator of Boehm
tries to find unused memory when either an allocation request is
done, or when explicitely invoked by the programmer.
The main advantage of a garbage collector is that the programmer is
freed from the responsability of allocating/deallocating memory. The
programmer requests memory to the GC, and then the rest is *automatic*.
Limitations of the GC.
---------------------
The GC needs to see all pointers in a program. Since it scans
periodically memory, it will assume that any block in its block list is
free to reuse when it can't find any pointers to it. This means that the
programmer can't store pointers in the disk, or in the "windows extra
bytes", as it was customary to do under older windows versions, or
elsewhere.
This is actually not a limitation since most programs do not write
pointers to disk, and expect them to be valid later...
Obviously, there is an infinite way to hide pointers (by XORing them
with some constant for instance) to hide them from the collector.
This is of no practical significance. Pointers aren't XORed in normal
programs, and if you stay within the normal alignment requirements
of the processor, everything works without any problems.
Performance considerations
--------------------------
In modern workstations, the time needed to make a complete sweep in
mid-size projects is very small, measured in some milliseconds. In
programs that are not real time the GC time is completely undetectable.
I have used Boehm's GC in the IDE of lcc-win32, specially in the
debugger. Each string I show in the "automatic" window is allocated
using the GC. In slow machines you can sometimes see a pause of
less than a second, completely undetectable unless you know that is
there and try to find it.
It must be said too that the malloc/free system is slow too, since at
each allocation request malloc must go through the list of free blocks
trying to find a free one. Memory must be consolidated too, to avoid
fragmentation, and a malloc call can become very expensive, depending
on the implementation and the allocation pattern done by the program.
Portability
-----------
Boehm's GC runs under most standard PC and UNIX/Linux platforms. The
collector should work on Linux, *BSD, recent Windows versions, MacOS X,
HP/UX, Solaris, Tru64, Irix and a few other operating systems. Some
ports are more polished than others. There are instructions for porting
the collector to a new platform. Kenjiro Taura, Toshio Endo, and Akinori
Yonezawa have made available a parallel collector.
Conclusions
-----------
The GC is a good alternative to traditional allocation strategies for C
(and C++). The main weakness of the malloc/free system is that it
doesn't scale. It is impossible to be good at doing a mind numbing task
without any error 100% of the time. You can be good at it, you can be
bad at it, but you can NEVER be perfect. It is human nature.
The GC frees you from those problems, and allows you to conecntrate in
the problems that really matter, and where you can show your strength
as software designer. It frees you from the boring task of keeping track
of each memory block you allocate.
jacob