Mark said:
My personal solution to this is to use Coco/R, an LL(1) scanner/generator.
I haven't used Coco, but I'd second Mark's recommendation to stay with
an LL(1) parser if possible. If not, then LL(n) ala ANTLR, but not LALR
ala yacc. The LL parsers are easy to write manually using recursive
descent, which I've done a few times.
[1] I find that thinking in the manner of a shift/reduce parser is
particularly unnatural to me. ... Maybe there is something I can read which
will turn the problem around, so it becomes easy to handle?
Shift just means "delay a decision about what I've just seen". Reduce is
the operation you do when you do decide. If you explore the ambiguity
in your grammar rules, these start to make more sense.
The primary disadvantage of Coco/R is the LL(1) part.
ANTLR does LL(n) for arbitrary n I believe - though you should avoid
n > 3 or humans start to have trouble parsing your language
.
It's a shame that the ANTLR folk at Purdue went Java-only when they
dropped their old C-based implementation. A multi-lingual ANTLR would
be super-cool, especially if it would generate Ruby.
As an
example, Ruby can not, as far as I have tried, be converted into an LL(1)
grammar, though C can.
Not without a tie-in to the lexical analyser to help recognise goto
labels, which require LL(2). Such a tie-in is commonly used however.
A simple example of the ruby grammar
Good example, thanks Mark.
I should point out that the major reason for the success of XML
(contrary to most of the hyped claims about it) is that it allows
people to create languages without having to create parsers. Or
rather, they use an XML parser which yields a DOM, and can process
the AST at will.
If you can live with the ugliness of XML and the size&speed of Rexml,
you should consider it.
There's no good reason why a language like Ruby shouldn't have
grammar rules as first-class objects (as Regexp's are), yielding
Ruby objects that reflect the AST, allowing attribute-grammar
parsers to be written and integrated directly within a program.
Such a tool, integrated into the Ruby interpreter itself, would
allow extension modules to define *Ruby syntax extensions*, so
that the language itself becomes plastic.
I haven't thought much about what these last two features would
look like in Ruby's case.
Clifford Heath.