give me some tips

R

Rod Pemberton

Michael Mair said:
Essentially, whatever you can learn from the Dragon Book or
Muchnick's book.
Found them. Thanks again.

Dragon Book:
"Compilers: Principles, Techniques and Tools" Alfred V. Aho, Ravi Sethi,
Jeffrey D. Ullman
"Advanced Compiler Design and Implementation" by Steven S. Muchnick

RP
 
M

Michael Mair

Rod said:
Found them. Thanks again.

Dragon Book:
"Compilers: Principles, Techniques and Tools" Alfred V. Aho, Ravi Sethi,
Jeffrey D. Ullman
"Advanced Compiler Design and Implementation" by Steven S. Muchnick

Note that the Dragon Book nowadays is quite outdated; IIRC,
there is a "21st Century Compilers" (or similar) book under way
or even out by much the same authors.

Cheers
Michael
 
M

Michael Mair

Rod said:
Hmm, graph theory. I'll _definately_ have a look at this. It'll probably
be a big time saver for me since my spatial relationship abilities are
(supposedly) off the charts.

I am not quite sure whether I understand this remark.
We are both talking about
http://en.wikipedia.org/wiki/Graph_theory ,
aren't we?
Well, you don't have to talk about that. I'm sure you've looked at other
stuff like:

GCC's RTL
TenDRA's ANDF

This one's new to me.
LCC's DAGs
Necula's CIL

So far, the only "C" compiler I've found that uses both a yacc and lex
grammar is Lutz Hamel's "C Subset Compiler" WCC for VMS.

Hmmm, as resources go, you probably know
http://www.lysator.liu.se/c/ANSI-C-grammar-y.html

Cheers
Michael
 
R

Rod Pemberton

Michael Mair said:
I am not quite sure whether I understand this remark.
We are both talking about
http://en.wikipedia.org/wiki/Graph_theory ,
aren't we?

You were. I wasn't... But, I just went through a number of graph theory
pages. I'm not sure exactly where I can use it in a compiler (yet). In
other words, binary trees, stacks, and reverse polish notation seem to do
everything I'm thinking of at the moment... Maybe I'll get it in the
future. Since I was just skimming, I didn't get to heavily into the
mathematics of graph theory.
This one's new to me.

Darn. I couldn't get a version for my OS, and was hoping someone had some
familiarity with ANDF...
http://www.info.uni-karlsruhe.de/~andf/index.htm

The TenDRA C and C++ compiler has two branches:
http://www.ten15.org/
http://www.tendra.org/

Yes, I've updated for myself, both his flex and bison grammar's to C99.

Are you aware of Edward Willink's C++ grammars:
http://www.computing.surrey.ac.uk/research/dsrg/fog/

Unfortunately, some of his links are incorrect, extra /v/. These are
corrected:
CxxGrammar.y
http://www.computing.surrey.ac.uk/research/dsrg/fog/CxxGrammar.y
CxxTester.y http://www.computing.surrey.ac.uk/research/dsrg/fog/CxxTester.y
FogGrammar.y
http://www.computing.surrey.ac.uk/research/dsrg/fog/FogGrammar.y
FogTester.y http://www.computing.surrey.ac.uk/research/dsrg/fog/FogTester.y


Rod Pemberton
 
M

Michael Mair

Rod said:
You were. I wasn't... But, I just went through a number of graph theory
pages. I'm not sure exactly where I can use it in a compiler (yet). In
other words, binary trees, stacks, and reverse polish notation seem to do
everything I'm thinking of at the moment... Maybe I'll get it in the
future. Since I was just skimming, I didn't get to heavily into the
mathematics of graph theory.

Okay, let us keep this basic.
If you introduce an intermediate representation, it is often
advantageous to introduce a graph with certain properties,
e.g. a tree; if you compile on a translation unit by translation
unit base, then you could make an abstract translation unit
the base of your tree and everything stored in file scope the
first level nodes. Then you descend, into functions, declarations,
statements, operations...
For intraprocedural analysis, you can build a hypergraph on top
of this tree representing your control flow, e.g. you can use
an SSA form for constant propagation, constant folding,
control flow optimisation, copy propagation, scope reduction, ...
The tree gives you the chance to navigate scopes (just stay on
one level).
Recognition of certain patterns as detection of certain kinds
of induced subgraphs can be much easier.
One key to success is an appropriate intermediate representation;
here is one point where intellectual property can rest.
Darn. I couldn't get a version for my OS, and was hoping someone had some
familiarity with ANDF...
http://www.info.uni-karlsruhe.de/~andf/index.htm

The TenDRA C and C++ compiler has two branches:
http://www.ten15.org/
http://www.tendra.org/

Anyway, thank you for the information :)
Yes, I've updated for myself, both his flex and bison grammar's to C99.

Would you put that up on the clc wiki?
BTW: We made it back on topic ;-)

I will check these out, too.


Cheers
Michael
 
C

Chris Torek

... I'm sure you've looked at other stuff like:

GCC's RTL
TenDRA's ANDF
LCC's DAGs
Necula's CIL

So far, the only "C" compiler I've found that uses both a yacc and lex
grammar is Lutz Hamel's "C Subset Compiler" WCC for VMS.

In reverse order: lex does not do grammars (it only does regular
expressions), and you would want flex rather than lex, or a hand-built
lexer, as original lex's C code was, um, "not so great". :)

GCC uses a grammar written in bison, which is essentially the same
as yacc (with of course the inevitable GNU extensions). As of many
years ago it was possible to feed this into byacc, Bob Corbett's
"Berkeley YACC", at least with a few bells added (I added them, at
BSDI).

I have never looked at CIL nor LCC.

ANDF came out of the Open Software Foundation's attempt to come up
with an "Architecture Neutral Distribution Format" (hence the
acronym). The goal was to be able to distribute the equivalent of
".o" (compiled object) files that were not compiled for any
*particular* machine, but rather would be fed into a "re-compiler"
that would turn them into appropriate machine-specific code and
link them together. Libraries would be either native (i.e., already
compiled for the target machine) or a collection of ANDF objects.
"Architecture neutral" intermediate file formats were not a new
idea, but previous attempts had always been abandoned or surpassed
(or both), due to what I think are pretty obvious reasons. Note
that ANDF was supposed to be "source-language neutral" as well as
"target-machine neutral", and there really is no such thing: while
it is possible to wedge a number of different source languages into
one intermediate language -- GCC does this with its RTL -- one
invariably winds up with a "union of all source constructs"
intermediate system

GCC's RTL ("register transfer language") is an outgrowth of research
originally done at the University of Arizona (Chris Fraser and J.
Davidson, in the 1980s). GCC's RTL was rather Lispified, not
surprising since the original versions of GCC itself were done by
Richard Stallman. While doing some quick background checks in
order to write this paragraph, I learned that GCC 3 and/or 4 added
some new layers between the original syntax trees built by its
front-ends and the RTL system. (I have not done anything major
inside GCC since 2.95.x, so this is all new to me.) GCC now does
a number of optimizations using the Static Single Assignment model
(which has obvious advantages over attempting them in RTL; see
<http://gcc.gnu.org/onlinedocs/gccint/SSA.html> et seq.).
 
B

Ben Pfaff

Chris Torek said:
GCC uses a grammar written in bison, which is essentially the same
as yacc (with of course the inevitable GNU extensions).

This was true for a long time, but no longer. GCC 4.1.0,
released on 2006-02-28, replaces GCC's C parser by a "new, faster
hand-written recursive-descent parser", as you can see at:
http://gcc.gnu.org/gcc-4.1/changes.html
 
C

Chris Torek

This was true for a long time, but no longer. GCC 4.1.0,
released on 2006-02-28, replaces GCC's C parser by a "new, faster
hand-written recursive-descent parser", as you can see at:
http://gcc.gnu.org/gcc-4.1/changes.html

Well, at least there is still a C parser. :)

And (on-topic, more or less) I see at the bottom of that page:

Some built-in functions have been fortified to protect applications
from stack-smashing attacks. The protection is realized by
buffer overflow detection and reordering of stack variables
to avoid pointer corruption.

This could cause "interesting" surprises with existing code that
makes use of undefined behavior. The web page gives no additional
details, though.

I also see that x86-64 uses multiple memory models, and attempts
to bypass Standard C by using "out-of-segment" memory references
may behave in ways that surprise programmers. Everything old is
new again. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,176
Messages
2,570,950
Members
47,500
Latest member
ArianneJsb

Latest Threads

Top