G
Gavin Kistner
I want to write my own wiki markup language. Pure regexp fails me, as
I need a proper parser to keep track of state.
I thought I'd give Syntax a try, but I'm a little confused as to some
of the specifics.
1) What is a 'region', and how do I use the start_region method? It's
not documented in the API, or the source. (I think this is what I
want for nesting tags.)
2) Do I have to close_group and close_region, or do they
automatically get invoked under certain circumstances? (Does starting
one group close the previous one? Do repeated calls to open the same
group cause them to be aggregated together (is that how accumulating
text in :normal groups works?)
3) How do I keep track of state during successive calls to #step? I
tried an instance variable, but that doesn't seem to exist across calls.
Following is my terrible, broken attempt at the basics of what I'm
after. Am I totally misunderstanding how to use Syntax?
require 'rubygems'
require_gem 'syntax'
class OWLScribble < Syntax::Tokenizer
def step
if heading = scan( /^={1,6}/ )
start_region "heading level #{heading.length}".intern
$heading_end = Regexp.new( heading + "\\s*" )
elsif $heading_end && ( heading = scan( $heading_end ) )
end_region "heading level #{heading.length}".intern
$heading_end = nil
elsif char = scan( /^[\r\n]/ )
start_group aragraph, char
elsif scan( /\*\*/ )
if $inbold
end_region :bold
$inbold = nil
else
start_region :bold
$inbold = true
end
elsif char = scan( /./ )
start_group :normal, char
else
scan( /[\r\n]/ )
end
end
end
Syntax::SYNTAX[ 'owlscribble' ] = OWLScribble
str = <<END
Intro paragraph
= Heading 1 =
First **paragraph** under the heading.
== Second **Heading** = very yes ==
Another paragraph.
END
tokenizer = Syntax.load( "owlscribble" )
tokenizer.tokenize( str ) do |token|
puts "#{token.group} (#{token.instruction}) #{token}"
end
I need a proper parser to keep track of state.
I thought I'd give Syntax a try, but I'm a little confused as to some
of the specifics.
1) What is a 'region', and how do I use the start_region method? It's
not documented in the API, or the source. (I think this is what I
want for nesting tags.)
2) Do I have to close_group and close_region, or do they
automatically get invoked under certain circumstances? (Does starting
one group close the previous one? Do repeated calls to open the same
group cause them to be aggregated together (is that how accumulating
text in :normal groups works?)
3) How do I keep track of state during successive calls to #step? I
tried an instance variable, but that doesn't seem to exist across calls.
Following is my terrible, broken attempt at the basics of what I'm
after. Am I totally misunderstanding how to use Syntax?
require 'rubygems'
require_gem 'syntax'
class OWLScribble < Syntax::Tokenizer
def step
if heading = scan( /^={1,6}/ )
start_region "heading level #{heading.length}".intern
$heading_end = Regexp.new( heading + "\\s*" )
elsif $heading_end && ( heading = scan( $heading_end ) )
end_region "heading level #{heading.length}".intern
$heading_end = nil
elsif char = scan( /^[\r\n]/ )
start_group aragraph, char
elsif scan( /\*\*/ )
if $inbold
end_region :bold
$inbold = nil
else
start_region :bold
$inbold = true
end
elsif char = scan( /./ )
start_group :normal, char
else
scan( /[\r\n]/ )
end
end
end
Syntax::SYNTAX[ 'owlscribble' ] = OWLScribble
str = <<END
Intro paragraph
= Heading 1 =
First **paragraph** under the heading.
== Second **Heading** = very yes ==
Another paragraph.
END
tokenizer = Syntax.load( "owlscribble" )
tokenizer.tokenize( str ) do |token|
puts "#{token.group} (#{token.instruction}) #{token}"
end