... I'm sure you've looked at other stuff like:
GCC's RTL
TenDRA's ANDF
LCC's DAGs
Necula's CIL
So far, the only "C" compiler I've found that uses both a yacc and lex
grammar is Lutz Hamel's "C Subset Compiler" WCC for VMS.
In reverse order: lex does not do grammars (it only does regular
expressions), and you would want flex rather than lex, or a hand-built
lexer, as original lex's C code was, um, "not so great".
GCC uses a grammar written in bison, which is essentially the same
as yacc (with of course the inevitable GNU extensions). As of many
years ago it was possible to feed this into byacc, Bob Corbett's
"Berkeley YACC", at least with a few bells added (I added them, at
BSDI).
I have never looked at CIL nor LCC.
ANDF came out of the Open Software Foundation's attempt to come up
with an "Architecture Neutral Distribution Format" (hence the
acronym). The goal was to be able to distribute the equivalent of
".o" (compiled object) files that were not compiled for any
*particular* machine, but rather would be fed into a "re-compiler"
that would turn them into appropriate machine-specific code and
link them together. Libraries would be either native (i.e., already
compiled for the target machine) or a collection of ANDF objects.
"Architecture neutral" intermediate file formats were not a new
idea, but previous attempts had always been abandoned or surpassed
(or both), due to what I think are pretty obvious reasons. Note
that ANDF was supposed to be "source-language neutral" as well as
"target-machine neutral", and there really is no such thing: while
it is possible to wedge a number of different source languages into
one intermediate language -- GCC does this with its RTL -- one
invariably winds up with a "union of all source constructs"
intermediate system
GCC's RTL ("register transfer language") is an outgrowth of research
originally done at the University of Arizona (Chris Fraser and J.
Davidson, in the 1980s). GCC's RTL was rather Lispified, not
surprising since the original versions of GCC itself were done by
Richard Stallman. While doing some quick background checks in
order to write this paragraph, I learned that GCC 3 and/or 4 added
some new layers between the original syntax trees built by its
front-ends and the RTL system. (I have not done anything major
inside GCC since 2.95.x, so this is all new to me.) GCC now does
a number of optimizations using the Static Single Assignment model
(which has obvious advantages over attempting them in RTL; see
<
http://gcc.gnu.org/onlinedocs/gccint/SSA.html> et seq.).