[ANN] Syntax 0.7.0

J

Jamis Buck

Syntax is a pure-Ruby framework for doing lexical analysis (and, in
particular, syntax highlighting) of text. It currently sports lexers
for Ruby, XML, and YAML, and an HTML convertor (for colorizing texts in
those languages to HTML).

Links:

Download: http://rubyforge.org/frs/?group_id=505
User Manual: http://docs.jamisbuck.org/read/book/4

This release is much improved in accuracy and robustness (at least, for
the Ruby lexer--the XML and YAML lexers were not changed). The Ruby
lexer now deals better with many ambiguous cases, and even supports
multiple heredocs on a single line. It accurately colorizes cgi.rb and
mkmf.rb from the standard lib, if that means anything at all to you.

The Syntax framework also supports "regions" now (thanks to flgr for
the suggestion) and sports many bug fixes (thanks to Carl Drinkwater
for discovering most of them). Syntax regions just allow one group to
span (and include) multiple groups--like a string that includes
interpolated expressions and escape sequences.

For a pretty example (mkmf.rb fully syntax highlighted) see
http://ruby.jamisbuck.org/mkmf.html.

The next release will include robustness fixes for the XML and YAML
lexers, as well as a lexer for C. Lexers for Perl, Python, Java, HTML,
and RHTML would be nice as well, if I can get to them. Community
submissions will be gladly accepted, as long as you are okay with your
contributed code being distributed under the BSD license.

Enjoy!

- Jamis
 
F

Florian Gross

Jamis said:
Syntax is a pure-Ruby framework for doing lexical analysis (and, in
particular, syntax highlighting) of text. It currently sports lexers for
Ruby, XML, and YAML, and an HTML convertor (for colorizing texts in
those languages to HTML).

And is indeed a wonderful Ruby library. It's just so very cool to have a
library that marks up Ruby properly with <span> classes. It allows you
to do quite a lot to Ruby code.

Thanks a lot, Jamis, for this very nice library!
For a pretty example (mkmf.rb fully syntax highlighted) see
http://ruby.jamisbuck.org/mkmf.html.

Another one (lots of new CSS) can be seen here:

http://flgr.0x42.net/highlighting.png

I'll be using the Syntax library for dissecting the submissions of the
IORCC and it is a wonderful help.

If you're recognizing your own code in the above screenshot then let me
tell you that you IMHO did a very nice job with your obfuscation.
The next release will include robustness fixes for the XML and YAML
lexers, as well as a lexer for C. Lexers for Perl, Python, Java, HTML,
and RHTML would be nice as well, if I can get to them. Community
submissions will be gladly accepted, as long as you are okay with your
contributed code being distributed under the BSD license.

Having a C lexer will be wonderful as that is exactly something that I'm
currently finding myself needing as well.

I think I'll be able to submit lexers for a few simple languages --
Befunge would be an easy one. But your framework seems to make lexing
more complex language easy as well, so I might as well try that. Guess
we'll see. :)
 
S

Sam Roberts

Quoting (e-mail address removed), on Thu, Mar 24, 2005 at 02:54:20PM +0900:
Syntax is a pure-Ruby framework for doing lexical analysis (and, in
particular, syntax highlighting) of text. It currently sports lexers
for Ruby, XML, and YAML, and an HTML convertor (for colorizing texts in
those languages to HTML).

Would this be an appropriate tool for parsing ruby to generate ctags?

To write a tags file I need to know where I am in ruby's terms (in what
class, module), what was found (method, attribute, constant, class,
...), AND I need to generate a regex that will find this place in the
file. For repeated names this can mean knowing what the entire line
looks like, so that I can put leading whitespace into the regex.

Is Syntax something I should be looking at? It seems there are some
similarities.. if you know enough to hilight, maybe you know enough to
generate a ctag?

I'm using rdoc right now, but it is a very large tool, and I would like
something smaller and more malleable, if possible.

Thanks,
Sam
 
J

Jamis Buck

Quoting (e-mail address removed), on Thu, Mar 24, 2005 at 02:54:20PM +0900:

Would this be an appropriate tool for parsing ruby to generate ctags?

Hmmm, maybe. Not in its current incarnation, though. One thing the
lexer doesn't give you right now is the location of each token in the
file. That would be a good addition, though. I'll see about adding that
to the next version.
To write a tags file I need to know where I am in ruby's terms (in what
class, module), what was found (method, attribute, constant, class,
...), AND I need to generate a regex that will find this place in the
file. For repeated names this can mean knowing what the entire line
looks like, so that I can put leading whitespace into the regex.

The lexers that come with Syntax are optimized for syntax highlighting.
You could conceivably write a different lexer module that was optimized
for tag extraction, using the Syntax framework. You'd probably do just
as well to use strscan directly, though.

- Jamis
 
G

gabriele renzi

Sam Roberts ha scritto:

I'm using rdoc right now, but it is a very large tool, and I would like
something smaller and more malleable, if possible.

why not ParseTree or ripper ?
 
T

Trans

Speacking of RDOC. Did anyone take up the call for a new maintainer? I
would love to see syntax highlighting in RDoc.

T.
 
S

Sam Roberts

Quoting (e-mail address removed), on Fri, Mar 25, 2005 at 01:27:37AM +0900:
Hmmm, maybe. Not in its current incarnation, though. One thing the
lexer doesn't give you right now is the location of each token in the
file. That would be a good addition, though. I'll see about adding that
to the next version.

I don't need location in file, I just need the text of the line:

module Foo
class Bar
class Bar
end

The tag would be
Bar-> regex / class Bar/
Bar-> regex / class Bar/
Foo.Bar -> regex / class Bar/
Foo.Bar.Bar -> regex / class Bar/

I don't need line no.

For this
module Foo
end
class Foo::Bar
end

The tags would be different:
Bar -> /class Foo::Bar/

And for
class
Foo
end

Different again.

Quoting (e-mail address removed), on Fri, Mar 25, 2005 at 01:49:52AM +0900:
Sam Roberts ha scritto:


why not ParseTree or ripper ?

I have no idea what ripper does, but parse tree just gives symbols, it
doesn't have enough information for me to build a regex, as above, does
it?

Making tags is an odd problem. It involves semantic analysis, when you
see class Foo, you need to know if it is in module Bar, or inside class
Joe. But, to generate the tag you need access to the original text so
that you can build a regex, which is sensitive to HOW you wrote the
code, not just what the code means. Most tokenizers goal in life is to
abstract you away from the text, so you just see a stream of syntactic
elements.

Rdoc is useful, because it does the analysis, but it also maintains
original text in a way it can (in some cases) be regenerated to form
regexes.

I think its not a bad place to put it, since tags as another output
format is a reasonable extension of its model.

But... it's really slow (i think its how much data it keeps in memory).
It also doesn't quite give me access to everything I want. I can hack
it, but I'm balking at the chore. Adding an output formatter was easy
and standalone. Hacking its internals... thats another story.

I'm totally open to suggestions. I NEED tags to read code effectively.

I'm faster writing in ruby than in C, but I read C code way, way, way
faster due to the tool support I have (vim+tags) (I debug C faster, too,
because I have a great debugger - gdb.) I'm not happy about this
situation.

Maybe I should suggest this as one of those ruby weekly challenges...
Document the tags format, the goals, and let people choose - rules are
that there are no rules, you can use any tool/library you want, even
non-ruby, and let the best code win. If its non-ruby, well, that would
point out an area where ruby could use some work.


Btw, syntax hilighting with rdoc should be easy, it tokenized the input.


Cheers,
Sam
 
F

Florian Gross

Sam said:
I have no idea what ripper does, but parse tree just gives symbols, it
doesn't have enough information for me to build a regex, as above, does
it?

Ripper basically is Ruby's integrated Ruby parser. It will invoke
callbacks for every kind of construct it encounters.

This code snippet ought to get you started with it:

irb(main):017:0> class MyParser < Ripper
irb(main):018:1> def method_missing(name, *args)
irb(main):019:2> puts "#{name}: #{args.inspect}"
irb(main):020:2> end
irb(main):021:1> end
=> nil
irb(main):022:0> MyParser.new.parse("puts 'Hello World!' if true")
on__scan: ["puts"]
on__IDENTIFIER: ["puts"]
on__scan: [" "]
on__space: [" "]
on__scan: ["'"]
on__new_string: ["'"]
on__scan: ["Hello World!"]
on__add_string: [nil, "Hello World!"]
on__scan: ["'"]
on__string_end: [nil, "'"]
on__scan: [" "]
on__space: [" "]
on__scan: ["if"]
on__KEYWORD: ["if"]
on__argstart: ["Hello World!"]
on__fcall: [:puts, nil]
on__scan: [" "]
on__space: [" "]
on__scan: ["true"]
on__KEYWORD: ["true"]
on__varref: [:true]
on__if_mod: [nil, nil]
=> nil
 
F

Florian Gross

Florian said:
Ripper basically is Ruby's integrated Ruby parser. It will invoke
callbacks for every kind of construct it encounters.

This code snippet ought to get you started with it:

Oh, and you need to do require 'ripper' before you can use it, of course.
 
G

Guillaume Marcais

I'm totally open to suggestions. I NEED tags to read code effectively.

I'm faster writing in ruby than in C, but I read C code way, way, way
faster due to the tool support I have (vim+tags) (I debug C faster, too,
because I have a great debugger - gdb.) I'm not happy about this
situation.

Maybe I don't understand what you need exactly, but exuberant ctags
supports both ruby and vi:
$ ctags --version
Exuberant Ctags 5.5.4, Copyright (C) 1996-2003 Darren Hiebert
Compiled: May 12 2004, 14:32:50
Addresses: <[email protected]>, http://ctags.sourceforge.net
Optional compiled features: +wildcards, +regex

$ ctags --list-languages | grep -i ruby
Ruby

It works for me with emacs...

Tell me if I am completely off base.

Cheers,
Guillaume.
 
S

Sam Roberts

Quoting (e-mail address removed), on Fri, Mar 25, 2005 at 03:11:15AM +0900:
Maybe I don't understand what you need exactly, but exuberant ctags
supports both ruby and vi:
$ ctags --version
Exuberant Ctags 5.5.4, Copyright (C) 1996-2003 Darren Hiebert
Compiled: May 12 2004, 14:32:50
Addresses: <[email protected]>, http://ctags.sourceforge.net
Optional compiled features: +wildcards, +regex
$ ctags --list-languages | grep -i ruby
Ruby

"Support", and "supports well" aren't the same thing.
It works for me with emacs...

Pico supports editing text, but it doesn't really compare to emacs, does
it? :)
Tell me if I am completely off base.

Half on, half off.

I think you've internalized the limitations, or don't realize how good
it could be.

It doesn't tag constants, and it doesn't support qualified tags.

Tags are downright useless (IMNSHO) if they aren't qualified in an OO
language. C only has one function per name (ignoring static functions).

Tag a large code-base, now jump to tag "new" (trick question, exctags
doesn't understand that "initialize" is called as "new").

Ok, now jump to "each", is it the right tag? No way, you've got one for
almost every class, because its the Ruby Way, and you've even more
definitions of #to_s. How much fun do you have walking them all to find
the one you wanted?

In well-supported languages, you would use --extra=+q, and get qualified
tags, so you could do:

<tag-cmd>Vc<TAB-complete name>ard.t<TAB-complete>o_s

And in about 5 keystrokes, you'd be at the definition of the method you
wanted, Vcard.to_s

It's also a cheap and fast class browser:

<tag-cmd>Vp<TAB><TAB>

would give you a list of all methods, classes, modules, and constants in
the module Vpim, and you can keep drilling down, exploring whats there.
Ah... heaven.

Doesn't work with exuberant ctags. I looked at adding it, but it's
awful. You need to maintain a stack of class module names so you know
where you are. How bad can that be, you ask? Terrible. You don't know
where they end. An "end" means all kind of things in ruby. Maybe I'll
give it another shot, but it looked hard to me.

It also has minor bugs, like it doesn't grok this:

class SomeModule::Foo
end


Maybe if you think exctags is OK, you've never felt the intoxicating
power of a fully operational Battle Star^w^w tagging system...

Cheers,
Sam
 
S

Sam Roberts

Quoting (e-mail address removed), on Fri, Mar 25, 2005 at 02:34:48AM +0900:
Oh, and you need to do require 'ripper' before you can use it, of course.

There is no files released, and the cvs is not building for me, I'll
have to try later.

[ensemble] ~/p/ruby/rtags/other/ripper/ripper $ make
make: Entering directory `/Users/sam/p/ruby/rtags/other/ripper/ripper'
touch parse.y
/opt/local/bin/ruby tools/preproc.rb parse.y > ripper.y
bison -t -v -oripper.c ripper.y
gperf -p -j1 -i 1 -g -o -t -N rb_reserved_word -k'1,3,$' keywords > lex.c
/opt/local/bin/ruby tools/list-parse-event-ids.rb parse.y | /opt/local/bin/ruby tools/generate-eventids1.rb > eventids1.c
gcc -fno-common -O -pipe -I/opt/local/include -fno-common -pipe -fno-common -I. -I/opt/local/lib/ruby/1.8/powerpc-darwin6.8 -I/opt/local/lib/ruby/1.8/powerpc-darwin6.8 -I. -O -pipe -I/opt/local/include -DRIPPER -c ripper.c -o ripper.o
cc -dynamic -bundle -undefined suppress -flat_namespace -L"/opt/local/lib" -o ripper.bundle ripper.o -lruby -ldl -lobjc
ld: multiple definitions of symbol _rb_reserved_word
ripper.o definition of _rb_reserved_word in section (__TEXT,__text)
/opt/local/lib/libruby.dylib(parse.o) definition of _rb_reserved_word
make: *** [ripper.bundle] Error 1


Sam
 
C

Cameron McBride

Maybe if you think exctags is OK, you've never felt the intoxicating
power of a fully operational Battle Star^w^w tagging system...

oops. time overlap.

Obviously, I'm not a tags poweruser. So what is an example of a fully
operational tagging system?

Cameron
 
G

Guillaume Marcais

Half on, half off.

I think you've internalized the limitations, or don't realize how good
it could be.

Well, I think you have been addicted to tags way more than I have. I use
them in couple occasion, but because of their limited usefulness in
Ruby, I never thought too much of them. Now, what you told me about the
potential power of well integrated tags does appeal to me a lot.
Consider me as a beta tester if you get anything done on the subject.

Thanks for the explanation,
Guillaume.
 
S

Sam Roberts

Quoting (e-mail address removed), on Fri, Mar 25, 2005 at 02:34:48AM +0900:
Ripper basically is Ruby's integrated Ruby parser. It will invoke
callbacks for every kind of construct it encounters.

Hm, look like it returns whitespace, and other non-syntactic elements.
Good. Does it return end-of-line, and is it just a lexer, or is it a
parser, too?

I.e., does

MyParser.new.parse("class Foo; Bar = 4; end;")

tell me that the Foo is a class name, and Bar is a constant name, or do
I have to deduce that?

If so, maybe i'll try.

The rubyforge page makes it look as if it may be written in C based on
ruby's parser, using lex&yacc. If so, that would be sweet, because it
might be fast.

Thanks,
Sam
This code snippet ought to get you started with it:

irb(main):017:0> class MyParser < Ripper
irb(main):018:1> def method_missing(name, *args)
irb(main):019:2> puts "#{name}: #{args.inspect}"
irb(main):020:2> end
irb(main):021:1> end
=> nil
irb(main):022:0> MyParser.new.parse("puts 'Hello World!' if true")
on__scan: ["puts"]
on__IDENTIFIER: ["puts"]
on__scan: [" "]
on__space: [" "]
on__scan: ["'"]
on__new_string: ["'"]
on__scan: ["Hello World!"]
on__add_string: [nil, "Hello World!"]
on__scan: ["'"]
on__string_end: [nil, "'"]
on__scan: [" "]
on__space: [" "]
on__scan: ["if"]
on__KEYWORD: ["if"]
on__argstart: ["Hello World!"]
on__fcall: [:puts, nil]
on__scan: [" "]
on__space: [" "]
on__scan: ["true"]
on__KEYWORD: ["true"]
on__varref: [:true]
on__if_mod: [nil, nil]
=> nil
 
F

Florian Gross

Sam said:
Quoting (e-mail address removed), on Fri, Mar 25, 2005 at 02:34:48AM +0900:


There is no files released, and the cvs is not building for me, I'll
have to try later.

Odd, doesn't it come bundled with Ruby already?

Just doing require 'ripper' worked for me. (I'm on the win32 one-click
installer.)
 
F

Florian Gross

Sam said:
Hm, look like it returns whitespace, and other non-syntactic elements.
Good. Does it return end-of-line, and is it just a lexer, or is it a
parser, too?

I.e., does

MyParser.new.parse("class Foo; Bar = 4; end;")

tell me that the Foo is a class name, and Bar is a constant name, or do
I have to deduce that?

If so, maybe i'll try.

The above produces quite a few events. Here's a few which seem to be
relevant to you:

on__ASSIGN: ["="]
on__assignable: [:Bar, nil]
[...]
on__assign: [nil, 4]
[...]
on__class: [:Foo, nil, nil]
on__set_line: [nil, 1]

It's probably best to try this out on your own.
The rubyforge page makes it look as if it may be written in C based on
ruby's parser, using lex&yacc. If so, that would be sweet, because it
might be fast.

I still think that it is build directly into Ruby and that it also comes
with it. It reuses the same parser as Ruby and thus ought to be very fast.
 
J

Jamis Buck

Odd, doesn't it come bundled with Ruby already?

Just doing require 'ripper' worked for me. (I'm on the win32
one-click installer.)

I'm on MacOSX and built Ruby myself, and there is no 'ripper' lib...

ruby -v --> ruby 1.8.2 (2004-12-25) [powerpc-darwin7.8.0]

- Jamis
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,164
Messages
2,570,898
Members
47,440
Latest member
YoungBorel

Latest Threads

Top