perl6 grammar rules in ruby

  • Thread starter Charles Comstock
  • Start date
C

Charles Comstock

Has anyone taken a look at the idea of having embedded grammars in ruby
like perl6 is intending to add? For instance, the example they give:

grammar Letter {
rule text { <greet> <body> <close> }

rule greet :w { [Hi|Hey|Yo] $to:=(\S+?) , $$}

rule body { <line>+ }

rule close :w { Later dude, $from:=(.+) }

# etc.
}

grammar FormalLetter is Letter {

rule greet :w { Dear $to:=(\S+?) , $$}

rule close :w { Yours sincerely, $from:=(.+) }

}

* Note that this is taken from the perl6 description of grammar, so the
syntax is perl6, obviously we would want a more ruby-esque syntax.

Embedded grammar means that not only can you create grammars to match
what your parsing in the language as opposed to a third party module,
but also that you can derive a grammar from another to add functionality
for a more specific class to match with your grammar. Now I don't know
if it would be easier in ruby just to define a few classes to get
functionality like this as opposed to having it at the language level,
but I think it is a good idea. I think the lack of subroutine calls
from the ruby regex engine would make it difficult to program outside of
the interpreter. Clearly the idea of including a bare regex system in
the language is very powerful, as we have already included it in the
language, but a grammar adds even more power and seems equally
important. Plus if it is implemented correctly, it clearly extends to
allowing many meta functions across ruby if the ruby grammar is
specified internally.

I'm thinking about submitting an RCR on this, but thought I would ask
everyone what they thought?

Charles Comstock
 
S

Simon Strandgaard

On Mon, 01 Mar 2004 02:11:52 -0600, Charles Comstock wrote:
[snip]
I think the lack of subroutine calls from the ruby regex engine would
make it difficult to program outside of the interpreter.

I am working on a regexp engine, written entirely in Ruby.
Mark Sparshatt is working on a perl6 parser for the engine.
We plan to support ruby code inside regexp.

http://rubyforge.org/projects/aeditor/

What really drives me crazy, is that I keep thinking: That Ruby will
have perl6 regexp, even long before Perl gets them ;-)
Anyway its just syntactical sugar.. there is also an XML flavor.

Clearly the idea of including a bare regex system in
the language is very powerful, as we have already included it in the
language, but a grammar adds even more power and seems equally
important. Plus if it is implemented correctly, it clearly extends to
allowing many meta functions across ruby if the ruby grammar is
specified internally.

At some point Ruby will switch it regexp engine from gnu to oniguruma.
Then an engine which is gnu compatible may become in handy.

At the moment I am struggling with nested quantifiers.. (yes, I have been
busy with this task for more than 30 days). My goal is to pass ok in more
than 95% of the regexp-testcases in rubicon. Last time I checked, I used a
conceptual wrong design, but managed to pass 92.5% out of 1652 assertions.

I'm thinking about submitting an RCR on this, but thought I would ask
everyone what they thought?

Not sure if I understand the proposal?
 
C

Charles Comstock

Simon said:
On Mon, 01 Mar 2004 02:11:52 -0600, Charles Comstock wrote:
[snip]
I think the lack of subroutine calls from the ruby regex engine would
make it difficult to program outside of the interpreter.


I am working on a regexp engine, written entirely in Ruby.
Mark Sparshatt is working on a perl6 parser for the engine.
We plan to support ruby code inside regexp.

The reason I like the grammar usage is I don't mind embedding code
inside of a grammar, but I don't like the idea of embedding code inside
of a regex, because it just seems to go against the grain of what regex
is intended for.
http://rubyforge.org/projects/aeditor/

What really drives me crazy, is that I keep thinking: That Ruby will
have perl6 regexp, even long before Perl gets them ;-)
Anyway its just syntactical sugar.. there is also an XML flavor.

I don't really want perl6 regex, I want perl6 grammars with embedded
regex, I guess that's just a symantics issue, but really i'm more
interested in embedded grammars then regex.
At some point Ruby will switch it regexp engine from gnu to oniguruma.
Then an engine which is gnu compatible may become in handy.

At the moment I am struggling with nested quantifiers.. (yes, I have been
busy with this task for more than 30 days). My goal is to pass ok in more
than 95% of the regexp-testcases in rubicon. Last time I checked, I used a
conceptual wrong design, but managed to pass 92.5% out of 1652 assertions.





Not sure if I understand the proposal?

The proposal was that Rite/Ruby 2.0 should have an embedded grammar
system. It seems like an acceptable time to jump in and add it, or at
the very least add the requisite keywords so it could be added later,
thus allowing Rite to mostly freeze the syntax of the language.
 
S

Simon Strandgaard

Simon Strandgaard wrote: [snip]
Not sure if I understand the proposal?

The proposal was that Rite/Ruby 2.0 should have an embedded grammar
system. It seems like an acceptable time to jump in and add it, or at
the very least add the requisite keywords so it could be added later,
thus allowing Rite to mostly freeze the syntax of the language.

You want something ala rebol's gramma?
http://www.rebol.com/docs/core23/rebolcore-15.html

Then I would like the same ;-)
 
M

Mauricio Fernández

Has anyone taken a look at the idea of having embedded grammars in ruby
like perl6 is intending to add? For instance, the example they give:

grammar Letter {
rule text { <greet> <body> <close> }

rule greet :w { [Hi|Hey|Yo] $to:=(\S+?) , $$}

rule body { <line>+ }

rule close :w { Later dude, $from:=(.+) }

# etc.
}

I believe this can be implemented easily in Ruby as it stands now,
without additional keywords. Many domain specific languages have been
defined this way.

--
_ _
| |__ __ _| |_ ___ _ __ ___ __ _ _ __
| '_ \ / _` | __/ __| '_ ` _ \ / _` | '_ \
| |_) | (_| | |_\__ \ | | | | | (_| | | | |
|_.__/ \__,_|\__|___/_| |_| |_|\__,_|_| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

<james> abuse me. I'm so lame I sent a bug report to debian-devel-changes
-- Seen on #Debian
 
C

Charles Comstock

Simon said:
Simon Strandgaard wrote:
[snip]
Not sure if I understand the proposal?

The proposal was that Rite/Ruby 2.0 should have an embedded grammar
system. It seems like an acceptable time to jump in and add it, or at
the very least add the requisite keywords so it could be added later,
thus allowing Rite to mostly freeze the syntax of the language.


You want something ala rebol's gramma?
http://www.rebol.com/docs/core23/rebolcore-15.html

Then I would like the same ;-)

Yea that looks similar, but if you look at the perl6 grammar spec,
http://dev.perl.org/perl6/exegesis/E05.html, and
http://dev.perl.org/perl6/apocalypse/A05.html, they are adding more then
just a straight up grammar parser in perl, they are also adding the
ability to subclass parts of the grammar and various other features.
Seeing the REBOL spec tells me they definitely partially based off of
that, but went alot further as well.

Charles Comstock
 
C

Charles Comstock

Mauricio said:
Has anyone taken a look at the idea of having embedded grammars in ruby
like perl6 is intending to add? For instance, the example they give:

grammar Letter {
rule text { <greet> <body> <close> }

rule greet :w { [Hi|Hey|Yo] $to:=(\S+?) , $$}

rule body { <line>+ }

rule close :w { Later dude, $from:=(.+) }

# etc.
}


I believe this can be implemented easily in Ruby as it stands now,
without additional keywords. Many domain specific languages have been
defined this way.

Two problems that I see with directly implementing this in Ruby:

1) That example didn't show it but they are embedding grammars inside
of native regex, which is something we can't currently do and still get
the look-ahead, fail, fall-back and try again aspect of the ruby regex.

2) The section immediatly afterwords that was in my post, which you
deleted, when they derived a sub grammar from this specification and
added more specific rules for that grammar. This seems easier to
overload in ruby, but I forsee problems here.

grammar FormalLetter is Letter {

rule greet :w { Dear $to:=(\S+?) , $$}

rule close :w { Yours sincerely, $from:=(.+) }

}

If you had a grammar block like this in a more ruby-esque syntax I would
think it would be more like this:

class Letter < Grammar
rule :text {greet; body; close}
...
end

class FormalLetter < Letter
rule :greet,:w { /Dear (\S+?)/ ...

Here we run into a bunch of problems with translation. I will start
thinking of a nice way to embed this in the ruby syntax, as I don't
really think that much of the Perl style syntax. While the inheritence
portion MAY be possible to implement in the current ruby syntax, other
parts would definitely need a custom regex engine, which probably
sacrifices speed amoung other things.

Charles Comstock
 
A

Ara.T.Howard

On Mon, 1 Mar 2004, Charles Comstock wrote:

Here we run into a bunch of problems with translation. I will start
thinking of a nice way to embed this in the ruby syntax, as I don't really
think that much of the Perl style syntax. While the inheritence
portion MAY be possible to implement in the current ruby syntax, other parts
would definitely need a custom regex engine, which probably sacrifices speed
amoung other things.

sound like an o.k. idea, but what advantage would it have over racc, which is
already distributed with ruby? i must say the value of dynamically creating
parsers seems like it would not be too heavily used - just making a good
parser is hard enough and i would think that any grammars simple enough to
generate parsers for on the fly would be simply enough to parse by hand.
considering that, i perfer the approach of racc which will generate a static
parser using a very ruby-esc syntax.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| ADDRESS :: E/GC2 325 Broadway, Boulder, CO 80305-3328
| URL :: http://www.ngdc.noaa.gov/stp/
| TRY :: for l in ruby perl;do $l -e "print \"\x3a\x2d\x29\x0a\"";done
===============================================================================
 
C

Charles Comstock

Ara.T.Howard said:
On Mon, 1 Mar 2004, Charles Comstock wrote:




sound like an o.k. idea, but what advantage would it have over racc, which is
already distributed with ruby? i must say the value of dynamically creating
parsers seems like it would not be too heavily used - just making a good
parser is hard enough and i would think that any grammars simple enough to
generate parsers for on the fly would be simply enough to parse by hand.
considering that, i perfer the approach of racc which will generate a static
parser using a very ruby-esc syntax.

-a

Well take for instance parsing some html, but while it is possible to
parse the whole file with REXML (assuming it's valid xhtml, which is
unlikely) or use ruby-htmltools which I guess tries to fix some
html-isms that break rexml, but both require a switch in context. If
you had an embedded grammar, you could construct a regex to look for the
region you wanted, and then construct a simple grammar to extract the
meaningful data from there. Yes you could technically write a grammar
with racc, but that is more structured towards parsing the whole file.
It's a question of programmar speed, and ease of use. When you code in
other languages which do not have regex embedded or, have regex objects
that have a more cumbersome syntax, as a programmar I don't depend on
them quite as much. I'm slightly more likely to use index or substr or
whatever which may be less suited to the task. I do this because it is
the easiest path in that language. It feels easier even though in the
long run it may not be.

Languages like awk and perl pushed forward this idea of embedded regex,
which allowed the programmar to use that functionality without in
essence switching contexts, letting regex become the easiest path. If a
grammar was equally available, if a grammar was the easiest path, then
it would become as useful a tool as the regex. I think it's hard to
think of all the places it would be easier to use a grammar, because we
aren't used to thinking that way as programmers. Making powerful tools
the path of least resistance can do nothing but benefit the programmer.

Charles Comstock
 
A

anon luker

Ara.T.Howard said:
On Mon, 1 Mar 2004, Charles Comstock wrote:



sound like an o.k. idea, but what advantage would it have over racc, which is
already distributed with ruby? i must say the value of dynamically creating
parsers seems like it would not be too heavily used - just making a good
parser is hard enough and i would think that any grammars simple enough to
generate parsers for on the fly would be simply enough to parse by hand.
considering that, i perfer the approach of racc which will generate a static
parser using a very ruby-esc syntax.

-a

FWIW, I am having a love affair w/ boost::spirit. I use it _all_ the
time. Planning a proper grammar for even trivial things like
command-line arguments and then concisely parsing them w/ spirit gives
me a warm fuzzy feeling. I would never ever ever use the traditional
parser generators (or their clones) for such tasks, though (nevermind
that I prefer writing right-recursive grammars). Spirit's style is
very much tied to the language for which it was written, but I still
think that it is a good role-model.
 
R

Robert Feldt

anon said:
FWIW, I am having a love affair w/ boost::spirit. I use it _all_ the
time. Planning a proper grammar for even trivial things like
command-line arguments and then concisely parsing them w/ spirit gives
me a warm fuzzy feeling. I would never ever ever use the traditional
parser generators (or their clones) for such tasks, though (nevermind
that I prefer writing right-recursive grammars). Spirit's style is
very much tied to the language for which it was written, but I still
think that it is a good role-model.
Is the importance the integration with the language or the fact that you
can create parsers dynamically? I'm aiming for the latter with Ruby but
the grammars can either be written in a string (in the Rockit grammar
format) or constructed directly with the Grammar classes. I haven't
checked out the Perl6 stuff but my impression is that the the original
poster thinks the integration into the host language is the most
important thing. Can someone clarify?

/Robert
 
C

Charles Comstock

Robert said:
Is the importance the integration with the language or the fact that you
can create parsers dynamically? I'm aiming for the latter with Ruby but
the grammars can either be written in a string (in the Rockit grammar
format) or constructed directly with the Grammar classes. I haven't
checked out the Perl6 stuff but my impression is that the the original
poster thinks the integration into the host language is the most
important thing. Can someone clarify?

/Robert
The advantage of the embedded system is you can call out to a grammar
inline from a builtin regex, something you can't really do in ruby at
the moment. Basically with the perl6 grammars you would be able to say
text =~ /__START__<ruby>__END__/ where ruby would be the grammar for
ruby with associated callbacks for each grammar node to say create an
AST. The fact it can be embedded in a regex gives the nice touch that
a) you can switch quickly back and forth between the grammar and lexer level
b) it just makes it alot easier to use a grammar from within the language.

It might help the discussion if a few more people take a look at how the
grammars are designed to work in perl6. If the functionality they are
trying is possible without extending the language then go for it, but I
am inclined to believe it is not possible without a seperate regex
engine at the moment. In addition the ability to inherit grammars would
probably be easier if it was embedded in the language, but it is
certainly possible to work around that constraint.

The pertinent perl6 grammar references:
http://dev.perl.org/perl6/exegesis/E05.html
http://dev.perl.org/perl6/apocalypse/A05.html

Charles Comstock
 
R

Robert Feldt

Charles said:
The advantage of the embedded system is you can call out to a grammar
inline from a builtin regex, something you can't really do in ruby at
the moment. Basically with the perl6 grammars you would be able to say
text =~ /__START__<ruby>__END__/ where ruby would be the grammar for
ruby with associated callbacks for each grammar node to say create an
AST. The fact it can be embedded in a regex gives the nice touch that
a) you can switch quickly back and forth between the grammar and lexer
level
b) it just makes it alot easier to use a grammar from within the
language.
Sounds useful and might mean people start using grammars more. I see
your point.
It might help the discussion if a few more people take a look at how
the grammars are designed to work in perl6. If the functionality they
are trying is possible without extending the language then go for it,
but I am inclined to believe it is not possible without a seperate
regex engine at the moment. In addition the ability to inherit
grammars would probably be easier if it was embedded in the language,
but it is certainly possible to work around that constraint.
I guess the sensible first step is to use it with a new extension
NRegexp for example. If it is very useful then making NRegexp the Regexp
of Ruby proper can be discussed. I think it's the wrong order to start
discussing incorporation in the language first.

Grammar inheritance is useful (not sure in what sense it's used here
though but I'll make my homework) but have nothing to do with the
integration/embedding really, right?
I'll take a look some day but I find Perl so hard to read that I might
not understand it... ;)

/Robert
 
R

Robert Feldt

Charles said:
The advantage of the embedded system is you can call out to a grammar
inline from a builtin regex, something you can't really do in ruby at
the moment. Basically with the perl6 grammars you would be able to say
text =~ /__START__<ruby>__END__/ where ruby would be the grammar for
ruby with associated callbacks for each grammar node to say create an
AST. The fact it can be embedded in a regex gives the nice touch that
a) you can switch quickly back and forth between the grammar and lexer
level
b) it just makes it alot easier to use a grammar from within the
language.

It might help the discussion if a few more people take a look at how
the grammars are designed to work in perl6. If the functionality they
are trying is possible without extending the language then go for it,
but I am inclined to believe it is not possible without a seperate
regex engine at the moment. In addition the ability to inherit
grammars would probably be easier if it was embedded in the language,
but it is certainly possible to work around that constraint.

The pertinent perl6 grammar references:
http://dev.perl.org/perl6/exegesis/E05.html
http://dev.perl.org/perl6/apocalypse/A05.html
I've taken a look and I agree it looks powerful. The embedding in the
language is crucial to what they are trying to do: making
grammars/parsing a first-class entity in the language that plays nicely
with the other parts. The things that is a bit hard to understand is how
powerful a parser this can be used to express. All context-free
grammars? An even larger class? I guess the latter might be the case
since they can have code intermingled.

The downside is the complexity of it, both in understanding and
implementing. Do they plan to support this in Parrot so that other
languages on top of Parrot can reuse the same implemenation? Seems like
this is quite a bit away into the future. Thanks for bringing it up.

/Robert
 
C

Charles Comstock

I've taken a look and I agree it looks powerful. The embedding in the
language is crucial to what they are trying to do: making
grammars/parsing a first-class entity in the language that plays nicely
with the other parts. The things that is a bit hard to understand is how
powerful a parser this can be used to express. All context-free
grammars? An even larger class? I guess the latter might be the case
since they can have code intermingled.

The downside is the complexity of it, both in understanding and
implementing. Do they plan to support this in Parrot so that other
languages on top of Parrot can reuse the same implemenation? Seems like
this is quite a bit away into the future. Thanks for bringing it up.

/Robert
I don't think it's built into parrot, but perhaps I am mistaken, mostly
it seems like more of syntax issue then a interpreter issue. I think
they may have some sort of adhoc equivalent as a module for perl5, not
sure how they are getting calls from within the regex, but they may be
doing some funny crap with perl c extensions. In classic perl manner
the syntax they describe for it is definitely pretty pregnant with
symantics, and very overloaded. However it seems as if the gist could
be streamlined, and made into a more ruby esque form, without loss of
generality.

Maybe if the next regex engine for ruby had some sort of embeddable code
block, then something like this could be implemented as a module,
ensuring it was true to existing syntax and symantics.

Unfortuneatly I don't really like embeddable code blocks in a regex
unless it is through a grammar syntax, as it seems like a more difficult
concept to grasp then the .. and ... operators. Definitely something
that needs more discussion though I believe.

Charles Comstock
 
A

anon luker

Could you please explain your take on the relationship between regular
expressions and dynamic parsers? Or regular expressions and grammar
specifications? Your argument that Ruby needs language-level support
for dynamic parsers seems to be based on one of these opaque
relationships.
 
C

Charles Comstock

anon said:
Could you please explain your take on the relationship between regular
expressions and dynamic parsers? Or regular expressions and grammar
specifications? Your argument that Ruby needs language-level support
for dynamic parsers seems to be based on one of these opaque
relationships.

A major part of all programming language use is parsing text in some
fashion or another, basically it's a solved problem due to grammars and
regex, it seems logical that if one includes regex support at the
language level, it makes equal sense to give you all the parsing tools
and allow easy interaction between grammar and embedded regex. It makes
it available to the programmer, it makes it easy, and it makes it less
of a creative stretch. IE it allows the programmer to think of certain
programming tasks from a grammar perspective, in the same way one things
of some tasks from a regex perspective. It gives the programmer a
larger vocabulary, without unecessarily complicating things for them.

That is why I am trying to push for this, but I think I'm not quite
following your question. Could you specify a little better, then
perhaps I could answer better ;)

Charles Comstock
 
A

anon luker

Charles Comstock said:
Well take for instance parsing some html

XML is maybe a better example, since you often need to dynamically
build your parser around some external spec (like a dtd).
 
G

gabriele renzi

il Fri, 5 Mar 2004 06:47:59 +0900, Robert Feldt <[email protected]>
ha scritto::

The downside is the complexity of it, both in understanding and
implementing. Do they plan to support this in Parrot so that other
languages on top of Parrot can reuse the same implemenation? Seems like
this is quite a bit away into the future. Thanks for bringing it up.

I may be wrong but in a talk larry wall said that there is a tool to
translate perl6 regxen in perl5 ones, so there should be not native
support in parrot.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top