Regular expression question.

L7 · Sep 13, 2006

In trying to parse a C source file I have the following section of
code:

...
...
case line
when /^.*\/\*.*?\*\/.*$/ # single line comment(s)
non_comments = line.split(/\/\*.*?\*\//).to_s
process_code(non_comments)
when /^.*\/\*\*?[^(\*\/)]*$/ # multi-line start
comment = true
next
when /^[^(\/\*)]*\*\/.*$/ # multi-line end
comment = false
...
...

I am running into a problem with the multi-line comment sections.
While something like:

/*
A comment
*/

will work (i.e. gets properly parsed out)

/* A
* comment */

OR

/* A *
* comment */

will not.
My guess is that it is because of the [^(\*\/)] construct blocking the
leading or trailing '*' character. However, I thought that by placing
the \*\/ within parenthesis I avoided the characters being evaluated
individually.
Is there a way to look for the pattern '*/' without having a single '*'
break the search?
As an alternative, I use this:

when /^.*\/\*\*?[^(\*\/)]*\**?$/
comment = true
next
when /^.*?[^(\/\*)]*\*\/.*$/
comment = false

Which *seems* to solve the problem, but I can see where it is brittle

/* A * comment
for * instance */

Any suggestions?
Thanks in advance.

Jeff Cohen · Sep 13, 2006

L7 said:
In trying to parse a C source file I have the following section of
code:

...
...
case line
when /^.*\/\*.*?\*\/.*$/ # single line comment(s)
non_comments = line.split(/\/\*.*?\*\//).to_s
process_code(non_comments)
when /^.*\/\*\*?[^(\*\/)]*$/ # multi-line start
comment = true
next
when /^[^(\/\*)]*\*\/.*$/ # multi-line end
comment = false
...
...

I am running into a problem with the multi-line comment sections.

My eyes glaze over with these kinds of expressions, but this might help:

http://www.regularexpressions.info/examplesprogrammer.html

Scroll down the section on "Comments". They seem to have a simpler
solution, I think the trick is to be able to use . as matching newlines.

And you can turn on newline matching in Ruby by putting an "m" after the
expression:

/my_pattern_here/m

Hope this helps...?

Jeff
softiesonrails.com

L7 · Sep 13, 2006

Jeff said:
My eyes glaze over with these kinds of expressions, but this might help:

http://www.regularexpressions.info/examplesprogrammer.html

Scroll down the section on "Comments". They seem to have a simpler
solution, I think the trick is to be able to use . as matching newlines.

I dont think it applies to this directly. I didnt explicitly mention,
but the processing is happening on a line-by-line basis. In order to
remove all commenting in the above manner I would first have to read
the file as a string, strip, split on newline then parse code.

Francis Cianfrocca · Sep 13, 2006

In trying to parse a C source file I have the following section of
code:

Remember that in C, nested comment-blocks are not permitted, for the
incredibly good reason that they are not recognizable by
regular-expressions ;-). Why don't you take a pre-pass through your C
file and take out the comments yourself before you run your main
parse? A recursive-descent parser to do the job would probably take
almost no code at all in Ruby.

L7 · Sep 13, 2006

Francis said:
Remember that in C, nested comment-blocks are not permitted, for the
incredibly good reason that they are not recognizable by
regular-expressions ;-).

Agreed. However, something with '*' characters in it is allowed (so
long as they are not preceeded or followed directly by '/') and that is
where I would get clobbered.

Why don't you take a pre-pass through your C
file and take out the comments yourself before you run your main

As I mentioned, that involved a bit of overhead. But with regard to the
project, I assume it is the 'best fix' to what I have.

Rod Knowlton · Sep 14, 2006

In trying to parse a C source file I have the following section of
code:

...
...
case line
when /^.*\/\*.*?\*\/.*$/ # single line comment(s)
non_comments = line.split(/\/\*.*?\*\//).to_s
process_code(non_comments)
when /^.*\/\*\*?[^(\*\/)]*$/ # multi-line start
comment = true
next
when /^[^(\/\*)]*\*\/.*$/ # multi-line end
comment = false
...
...

Is there a way to look for the pattern '*/' without having a single
'*'
break the search?

If I'm not mistaken, what you need is a negative lookahead

try /^.*\/\*([^\/]|\/(?!\*))*$/ for multi-line start

and /^([^\*]|\*(?!\/))*\*\/.*$/ for multi-line end

the key difference (from the start pattern) is ([^\/]|\/(?!\*))

this breaks down like so:

(
[^\/] # anything but /
| # or
\/(?!\*) # a / not followed by an * (don't eat the character after /,
just peek at it)
)

The pattern for multi-line end uses the same technique, but with the
characters reversed.

I'm sure this isn't the be all and end all of C comment matching
regexs, but it handles all of the cases you described.

- Rod

Tom Copeland · Sep 15, 2006

I am intrigued, I believe that the regular expression to find all comments
in C must be very complex and probably not the correct tool, look at these
snipplets

// /*
if(strcmp(x,"*/")
// "*/
etc. etc.

I'm not sure if it's impossible to parse out C-style comments using a
regular expression, but the various JavaCC grammars I've seen all use
lexical states to do it instead. Another complication is trigraphs (*),
although I think those are unrecognized by default in most C
preprocessors.

Yours,

Tom

(*) http://en.wikipedia.org/wiki/C_trigraph

Logan Capaldo · Sep 15, 2006

One more point. Someone upthread gave an example similar to this:

/* printf ("*/"); */

Pretty sure this would end up being a syntax error

Considered strictly as a lexical construction, I think this is regular.
However, I have a funny feeling that this:

/* printf ("/*......*/"); */

This too.

gcc agrees with me at least:

% cat comments.c
#include <stdio.h>

int main(int argc, char **argv) {
/* printf("*/"); */
/* printf("/*.......*/"); */
return 0;
}
% gcc -c comments.c
comments.c: In function 'main':
comments.c:4: error: missing terminating " character
comments.c:5: error: missing terminating " character

is actually context-free. Does anyone know for sure?

As for whether or not its context free, I don't know, but I think you
overestimated how hard C tries. /* */ are not nestable for instance.

Logan Capaldo · Sep 15, 2006

I know these are syntax errors in C. I was talking about a hypothetical
language (not C) that defined such constructs as legal. I'm still not sure
that it's impossible to use a regular language to generate this case:
/* "*/ */
I'm pretty convinced that the other case requires a context-free language.

Well for empirical evidence one could look at ML. (* comments (* are *)
nestable *).

Daniel Martin · Sep 15, 2006

Francis Cianfrocca said:
One more point. Someone upthread gave an example similar to this:

/* printf ("*/"); */

Considered strictly as a lexical construction, I think this is regular.
However, I have a funny feeling that this:

/* printf ("/*......*/"); */

is actually context-free. Does anyone know for sure?

So you want to know if a grammar is regular or not? Sounds like you
need the Myhill-Nerode theorem
(http://en.wikipedia.org/wiki/Myhill-Nerode_theorem).

And according to that, a language that allows arbitrary nesting of
comment expressions like this is indeed not regular, and therefore not
parseable with regular expressions as traditionally defined in
computer science. To parse arbitrarily nested constructs you either
need something like perl's evaluate-code-at-regexp-match-time feature
(which so far as I know exists in no other language), or an actual
grammar. (or anything else that can get as complicated
computationally as a pushdown automaton)

Multi-line regular expression match question	5	Nov 19, 2010
regular expression question...	2	May 26, 2008
Fatal error: Uncaught Error: Cannot use object of type WP_Error as array in	0	Dec 23, 2021
regular expression NOT operator	3	Sep 7, 2009
regular expression	1	Aug 6, 2008
Regular expression	1	Jun 20, 2008
Regular expression help 2	3	Apr 17, 2007
Regular Expression for Finding and Deleting comments	1	Jan 4, 2011

Regular expression question.

L7

Jeff Cohen

L7

Francis Cianfrocca

L7

Rod Knowlton

Tom Copeland

Logan Capaldo

Logan Capaldo

Daniel Martin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads