Chris F Clark said:
Have you ever tried writing a non-trivial compiler with the OO
approach? I have and maintain one on a daily basis. Now, I won't
dispute many of your claims, except that the OO approach is
unsuitable. You don't get too many auxillary functions scattered over
too classes unless your design is poor.
I've written a compiler using generic functions
(
http://sourceforge.net/cvs/?group_id=110425). This means that
dispatched functions are defined separately from types they dispatch
on, and their specializations are defined separately from generic
functions and from the types (I tend to write them near the
introductions of generic functions though).
There are several phases in the compiler. Each phase examines the
result of the previous phase and prepares data for the next phase.
Between most pairs of phases the module is represented using a family
of types for different kinds of tree nodes. Each time it's a different
family of types. Families of types usually form a two-level hiearchy,
for example "lifted" code has abstract supertypes for expressions,
patterns, definitions and meanings, and each of them is further split
into concrete types.
Below are the counts of generic functions dispatched on nodes of
each kind of intermediate code, for each module (I didn't count
other generic functions, e.g. dispatched on the types of literals).
A traditional OO approach would force to put all functions dispatched
on the same type together. For example pretty printing is quite
independent from other processing and it's used only for debugging
except the final phase, thus I defined it in separate modules, so it
doesn't get in the way when I'm looking at more important code.
(An OO approach would perhaps use fewer function names, because for
clarity I use different generic functions for domain subsets which are
never mixed and dispatched dynamically. For example I have separate
functions LiftExpr, Lift1Pat, Lift2Pat, Lift1Def, Lift2Def. An OO
approach would probably spell them just Lift, Lift1 and Lift2,
especially as they would be put in separate namespaces. So the real
function counts in an OO approach might be smaller, but the amount
of forced demodularization would remain the same.)
Source code:
- SourcePrinter: 1 (pretty printing)
- Expander: 17 (transforming source code to abstract code)
Abstract code:
- Abstract: 2 (type definitions, auxiliary functions)
- AbstractPrinter: 6 (pretty printing)
- Interfaces: 1 (generating interface files)
- Expander: 5 (some minor functions)
- Lifter: 17 (transforming abstract code to lifted code)
Lifted code:
- Lifted: 5 (type definitions, auxiliary functions)
- LiftedPrinter: 4 (pretty printing)
- Lifter: 3 (some minor functions)
- Planner: 16 (transforming lifted code to sequential code)
Sequential code:
- Sequential: 4 (type definitions, auxiliary functions)
- SequentialPrinter: 4 (pretty printing)
- Planner: 2 (some minor functions)
- CCoder: 12 (transforming sequential code to C code)
C code:
- CCode: 1 (type definitions, auxiliary functions)
- CCodePrinter: 8 (pretty printing)
- CCoder: 2 (some minor functions)
With a traditional OO approach the same design would be doable, at the
cost of forcing to put together type definitions of a phase, auxiliary
functions working on these types, pretty printing, and the main work
of transforming these values to the next representation. I prefer
to have freedom to arrange modules according to my taste rather than
being forced to put together functions just because they are dispatched
on the same family of types.
Currently the representations are independent from one another, and
a module which implements a transformation phase depends only on the
representation it examines and the representation it produces. The OO
approach would significantly increase the depth of the dependency
graph: each family of type definitions would depend on the following
phases (because it includes code which produces the next phase).
Technically this might not mean anything, it only hurts my taste.
In this language modules are used as units of name visibility. For
example each module which implements a transformation phase exports
only one name, a function transforming the whole module being
compiled. If visibility boundaries were tied to classes, I would have
to either make more names public, or introduce lots of "friendships"
(or whatever the language provides for extending visibility). I prefer
to be able to design visibility domains independently from grouping
functions for the purpose of dispatching.