How is a compiler written

G

Greg Comeau

I know it is still OT but since you made a point I wondered what you
would suggest instead of lex & yacc. I mean, hey, these are powerful
tools. :)

My point wasn't that lex and yacc should not be used, just
to counter the claim that they were the best tools. I don't
actually know the current offerings; I usually prefer to hand
write compilers.
 
E

Erik de Castro Lopo

Greg said:
My point wasn't that lex and yacc should not be used, just
to counter the claim that they were the best tools. I don't
actually know the current offerings; I usually prefer to hand
write compilers.

I know its a little off topic, but would you be able to
explain briefly how you do this with code generators
like lex/yacc?

I feel pretty comfortable with lex/yacc so I'm really
just interested in how the other side of the fence
looks.

Erik
--
+-----------------------------------------------------------+
Erik de Castro Lopo
+-----------------------------------------------------------+
"A violent gang rapist should have been given a lesser sentence partly
because he was a 'cultural time bomb' whose attacks were inevitable, as
he had emigrated from a country with traditional views of women"
http://www.smh.com.au/news/national...ble-says-lawyer/2005/10/11/1128796528939.html
 
C

Charles Richmond

Greg said:
My point wasn't that lex and yacc should not be used, just
to counter the claim that they were the best tools. I don't
actually know the current offerings; I usually prefer to hand
write compilers.

ISTM that you fellas are ignoring the code generation and optomization
sections of the compiler. Without those, what you will end up with is a
language recognizer.
 
T

the_cacc

thats it keef, keep pissing in the wind.,

Keith said:
As a matter of fact, I do some suggestions. Just wait around until
you see someone offering sound advice, then barge in and throw a few
insults around. Pointedly ignoring the offered advice at the same
time might be a nice touch.

But I'm sure you don't need any suggestions from me on the topic.
 
M

Mike Wahler

thats it keef, keep pissing in the wind.,

Analysis of 'the_cacc':

1. No constructive contributions here. Ever.

2. Attacks someone whose participation here
has long been proven quite valuable to all
who are interested in C.

3. Top posts.

4. Is rude and vulgar.

*PLONK*

-Mike
 
T

tuncay tekle

Erik said:
I know its a little off topic, but would you be able to
explain briefly how you do this with code generators
like lex/yacc?

Well, I suppose you want to learn how to implement a compiler with
lex/yacc. First notice that every program you write in a programming
language is according to a grammar. (Here is the C grammar in
Backus-Naur Form:
http://lists.canonical.org/pipermail/kragen-hacks/1999-October/000201.html).
So with lex/yacc, you scan the program and start parsing it. After you
are done with parsing, you probably come up with some data structures
containing the semantics of the program, and then you may want to go
over these data structures for a couple of times for reasons like
optimization (which is completely non-trivial) and then you are done.

This is a very basic sketch of what is going on. If you want to go on,
you can find the next episodes in a bookstore. Say "It is I, Arthur,
son of Uther Pendragon, from the castle of Camelot. King of the
Britons, defeater of the Saxons, Sovereign of all England! Give me the
Dragon Book".

Or a broader sketch exists at
http://cs.wwc.edu/~aabyan/Linux/compiler.pdf
 
E

Erik de Castro Lopo

tuncay said:
Well, I suppose you want to learn how to implement a compiler with
lex/yacc.


Sorry, I missed a very crucial "out" in the sentence you
quoted. It should have been :
I know its a little off topic, but would you be able to
explain briefly how you do this **without** code generators
like lex/yacc?

In the paragraph after the one you quoted I stated that
i was already very comfortable with lex/yacc. So confortable
that I could image a more powerful code generator than lex/yacc,
but not doing it without something like those programs.

Erik
--
+-----------------------------------------------------------+
Erik de Castro Lopo
+-----------------------------------------------------------+
Turks embrace novelist's war on EU
http://www.iht.com/articles/2005/10/12/news/novel.php
 
S

Skarmander

tuncay tekle wrote:
This is a very basic sketch of what is going on. If you want to go on,
you can find the next episodes in a bookstore. Say "It is I, Arthur,
son of Uther Pendragon, from the castle of Camelot. King of the
Britons, defeater of the Saxons, Sovereign of all England! Give me the
Dragon Book".

At which point the mystified store owner will probably reply "you're a
loony" or, if more versed in the Matter of Britain, "you're using coconuts".

S.
 
T

tuncay tekle

In the paragraph after the one you quoted I stated that
i was already very comfortable with lex/yacc. So confortable
that I could image a more powerful code generator than lex/yacc,
but not doing it without something like those programs.

Erik


Again you could look it up in the Dragon Book. But it would be a tough
challenge to deal with, say "shift-reduce conflict". I mean, of course,
you can do it, but *why* would be "the" question.
 
S

Skarmander

Michael said:
YACC can't have come first, if its name is accurate. Third, at best.

It wasn't the first one to be written. But it came first, and still does.

S.
 
W

Walter Bright

Greg Comeau said:
The Dragon Book is a good one suggestion, but suggesting the
best way to write a compiler is with lex and yacc is not.

I agree. For one, the lexer and parser are often the simplest and most easy
parts of a compiler to write (*). The hard stuff comes later, for which lex
and yacc are quite useless. The annoying thing about most compiler books is
all the attention they give to lexing and parsing, and the short shrift for
the hard parts (semantic analysis, optimization, code generation).

Me, I've usually found that lex and yacc take longer to write than building
by hand, the error handling is poor, and the generated lexer/parser is
inefficient.

(*) The exception to this rule is C/C++, for which writing a lexer is very
hard, and writing a parser (for C++) is even worse. Unfortunately, this
complexity also renders lex and yacc impractical for those languages as well
<g>.

If you want to see how simple lexers and parsers can be, download the D
compiler (from www.digitalmars.com/d/) which comes with source for the
compiler front end. There is also the lexer/parser for ECMAscript (aka
javascript) downloadable from www.digitalmars.com/dscript. These are
professional grade lexer/parsers (GPL license).

-Walter Bright
Digital Mars C, C++, D programming language compilers
 
W

Walter Bright

Erik de Castro Lopo said:
Lexing and parsing is a very small portion of what a compiler
does these days hence there is little to be gained in speed
by using hand written code in the lexer and parser when the
lex/yacc solution is already pretty damn good.

In my extensive profiling and tuning of compiler performance over the years,
the lex speed is critical, especially for C/C++ (due the massive nature of
the aggregated #include files). A hand-tuned lexer can yield big performance
gains.

-Walter Bright
www.digitalmars.com C, C++, D programming language compilers
 
K

Keith Thompson

Walter Bright said:
I agree. For one, the lexer and parser are often the simplest and most easy
parts of a compiler to write (*). The hard stuff comes later, for which lex
and yacc are quite useless. The annoying thing about most compiler books is
all the attention they give to lexing and parsing, and the short shrift for
the hard parts (semantic analysis, optimization, code generation).

Me, I've usually found that lex and yacc take longer to write than building
by hand, the error handling is poor, and the generated lexer/parser is
inefficient.

(*) The exception to this rule is C/C++, for which writing a lexer is very
hard, and writing a parser (for C++) is even worse. Unfortunately, this
complexity also renders lex and yacc impractical for those languages as well
<g>.

I know that parsing C is difficult because of the treatment of typedef
names (are there other serious problems?), but what makes writing a
lexer so difficult? Are you referring to the preprocessing phase? I
suppose you really need two lexers, one that's part of the
preprocessor and another that works on the preprocessor's output, but
I don't see that either of them would be overly complex.

(To keep this topical, I'm looking for an answer in terms of something
about the language definition that makes it difficult.)
 
S

Skarmander

Jordan said:
I believe he meant the fact that it is "Yet another" one.

I know. But "you're right, makes my choice of words look slightly goofy,
doesn't it" doesn't make for a very interesting post.

Ah, fugeddaboudid.

S.
 
W

Walter Bright

Keith Thompson said:
I know that parsing C is difficult because of the treatment of typedef
names (are there other serious problems?), but what makes writing a
lexer so difficult? Are you referring to the preprocessing phase? I
suppose you really need two lexers, one that's part of the
preprocessor and another that works on the preprocessor's output, but
I don't see that either of them would be overly complex.

(To keep this topical, I'm looking for an answer in terms of something
about the language definition that makes it difficult.)

The preprocessor throws a monkey wrench into things. The preprocessor has a
different idea of what a token is than C. There are the phases of
translation. There are things like supporting multiple source character
sets. Getting the preprocessor to be standards compliant is a fiendishly
difficult task, to my knowledge, the Digital Mars preprocessor is one the
very few(*) that is (and this is what, 15 years after the standard was
standardized?). There are wierdities like trigraphs, digraphs, backslash
line splicing, \u identifier characters, token concatenation, stringizing,
#line, varying context-dependent meanings of newlines, etc.

And overriding all this, because C compilers normally need to chew on
enormous quantities of #include files (the aggregate often being over a
million lines), is the need for speed. Any data structures used need to be
scalable over a very wide range.

So, writing a compliant and *useful* C lexer is a pretty challenging task.
Writing a lexer for D, or javascript, comparatively speaking, can be done
over dessert <g>.

-Walter Bright
www.digitalmars.com C, C++, D programming language compilers

(*) Caveat: I haven't done exhaustive testing of other compiler
preprocessors, but one example that often fails on otherwise excellent
implementations is the example in the Standard itself. It's very perverse,
and nobody would expect to see such cases in real code, so it isn't
important.
 
K

Keith Thompson

Walter Bright said:
(*) Caveat: I haven't done exhaustive testing of other compiler
preprocessors, but one example that often fails on otherwise excellent
implementations is the example in the Standard itself. It's very perverse,
and nobody would expect to see such cases in real code, so it isn't
important.

Which example are you referring to?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,170
Messages
2,570,927
Members
47,469
Latest member
benny001

Latest Threads

Top