R
rpardee
Hey All,
I'm trying to parse lines from my text editor's config file, which look
like this (pls watch for line wrap--there is one line per language,
starting with /L<<digit>>):
/L1"SAS" Line Comment = * Block Comment On = /* Block Comment Off = */
Block Comment On Alt = * Block Comment Off Alt = ; Nocase File
Extensions = SAS
/L2"Visual Basic" Line Comment = ' File Extensions = BAS FRM CLS VBS
CTL WSF
/L4"HTML" Nocase Noquote HTML_LANG Block Comment On = <!-- Block
Comment Off = --> Block Comment On Alt = <% Block Comment Off Alt = %>
String Chars = "' File Extensions = HTM HTML ASP SHTML HTT HTX JSP
/L11"Ruby" Line Comment Num = 2# Block Comment On = =begin Block
Comment Off = =end String Chars='" Escape Char = \ File Extensions = RB
RBW
I'm trying to write a method for extracting the comment markers & their
types (line/block & on/off). Regexps seemed the obvious tool, and I
eventually came up with this one:
c = Regexp.new("(Line|Block) Comment (On |Off |On Alt |Off Alt)*=
([^\s\t\r\n\f]+) ")
This is working well so far, except that it only grabs out the first
type of comment in each line. I'd hoped that I could make it get all
the comment types by putting an additional set of parens and a +
quantifier around the whole expression:
c = Regexp.new("((Line|Block) Comment (On |Off |On Alt |Off Alt)*=
([^\s\t\r\n\f]+))+ ")
But that just seems to break it--that version doesn't capture anything.
Anybody got a clue for me? I'm using v1.8 on windows. My code is
below. (And again, pls watch for line wrapping).
Thanks!
-Roy
def parse_comment_markers(line)
=begin
There are line comments & (2 different kinds of) block comments.
Line comments only have a start marker--EOL is the terminator.
Comment types are:
Line Comment = <<mark>>
Block Comment On = <<mark>>
Block Comment Off = <<mark>>
Block Comment On Alt = <<mark>>
Block Comment Off Alt = <<mark>>
Where <<mark>> can be any contiguous set of non-whitespace chars.
For Line comment marks, preceding digits specify the # of spaces
minus 1
required after the nondigit portion of the marker. So for ruby, the
line
comment mark is 2#, signifying that # is a comment only if it is
followed by
a space. Ignore this for now.
So--funky regexp time. We want to grab sequences centered around
the string " Comment ".
We want the single word prior to "Comment" and all words between
"Comment" and " = ", and
then of course the contiguous nonwhitespace following " = ".
=end
puts line
# Why doesn't the \S char class work?
# c = Regexp.new("(Line|Block) Comment (On |Off |On Alt |Off Alt)*=
(\S+)")
c = Regexp.new("(Line|Block) Comment (On |Off |On Alt |Off Alt)*=
([^\s\t\r\n\f]+) ")
cm = c.match(line)
if cm.nil?
puts "No match!"
else
puts cm.captures.join(" || ")
puts "Comment type is \"" + cm.captures[0] + "\", and comment
marker is \"" + cm.captures[2] + "\""
end
end
parse_comment_markers("/L2 \"Ruby\" Line Comment = # Block Comment On =
' File Extensions = RB RBW")
parse_comment_markers("/L2 \"Ruby\" Block Comment On = =begin Block
Comment Off = =end File Extensions = RB RBW")
I'm trying to parse lines from my text editor's config file, which look
like this (pls watch for line wrap--there is one line per language,
starting with /L<<digit>>):
/L1"SAS" Line Comment = * Block Comment On = /* Block Comment Off = */
Block Comment On Alt = * Block Comment Off Alt = ; Nocase File
Extensions = SAS
/L2"Visual Basic" Line Comment = ' File Extensions = BAS FRM CLS VBS
CTL WSF
/L4"HTML" Nocase Noquote HTML_LANG Block Comment On = <!-- Block
Comment Off = --> Block Comment On Alt = <% Block Comment Off Alt = %>
String Chars = "' File Extensions = HTM HTML ASP SHTML HTT HTX JSP
/L11"Ruby" Line Comment Num = 2# Block Comment On = =begin Block
Comment Off = =end String Chars='" Escape Char = \ File Extensions = RB
RBW
I'm trying to write a method for extracting the comment markers & their
types (line/block & on/off). Regexps seemed the obvious tool, and I
eventually came up with this one:
c = Regexp.new("(Line|Block) Comment (On |Off |On Alt |Off Alt)*=
([^\s\t\r\n\f]+) ")
This is working well so far, except that it only grabs out the first
type of comment in each line. I'd hoped that I could make it get all
the comment types by putting an additional set of parens and a +
quantifier around the whole expression:
c = Regexp.new("((Line|Block) Comment (On |Off |On Alt |Off Alt)*=
([^\s\t\r\n\f]+))+ ")
But that just seems to break it--that version doesn't capture anything.
Anybody got a clue for me? I'm using v1.8 on windows. My code is
below. (And again, pls watch for line wrapping).
Thanks!
-Roy
def parse_comment_markers(line)
=begin
There are line comments & (2 different kinds of) block comments.
Line comments only have a start marker--EOL is the terminator.
Comment types are:
Line Comment = <<mark>>
Block Comment On = <<mark>>
Block Comment Off = <<mark>>
Block Comment On Alt = <<mark>>
Block Comment Off Alt = <<mark>>
Where <<mark>> can be any contiguous set of non-whitespace chars.
For Line comment marks, preceding digits specify the # of spaces
minus 1
required after the nondigit portion of the marker. So for ruby, the
line
comment mark is 2#, signifying that # is a comment only if it is
followed by
a space. Ignore this for now.
So--funky regexp time. We want to grab sequences centered around
the string " Comment ".
We want the single word prior to "Comment" and all words between
"Comment" and " = ", and
then of course the contiguous nonwhitespace following " = ".
=end
puts line
# Why doesn't the \S char class work?
# c = Regexp.new("(Line|Block) Comment (On |Off |On Alt |Off Alt)*=
(\S+)")
c = Regexp.new("(Line|Block) Comment (On |Off |On Alt |Off Alt)*=
([^\s\t\r\n\f]+) ")
cm = c.match(line)
if cm.nil?
puts "No match!"
else
puts cm.captures.join(" || ")
puts "Comment type is \"" + cm.captures[0] + "\", and comment
marker is \"" + cm.captures[2] + "\""
end
end
parse_comment_markers("/L2 \"Ruby\" Line Comment = # Block Comment On =
' File Extensions = RB RBW")
parse_comment_markers("/L2 \"Ruby\" Block Comment On = =begin Block
Comment Off = =end File Extensions = RB RBW")