Why don't C comments nest?

E

Eric Sosman

Sorry I don't understand your point here.

See below.
But if someone wants to write a /* ... */ comment then they will be tripped
up by anything containing a */:

... which Kaz' example does not. Repeat: Kaz' example does not
contain a */ inside the comment. Look more closely.

His example works fine in today's C, but would get into trouble
in a C where comments nested. Why? Because you're on the horns of
a dilemma:

- You can see the quoted /* as the start of a nested comment,
causing you to interpret the */ as the end of that nested
comment,

or

- You must lex the internals of the comment to recognize that
that "/*" is a string literal and not the start of an inner
comment.

If you take the first course, then the */ ends the internal comment
but leaves the external comment still alive, still eating all the
rest of the source file. If you take the second, then you must
disallow comments of the kind I illustrated above, because your
lexer will trip over the unmatched ' and/or " characters.
 
B

BartC

Eric Sosman said:
On 11/12/2011 6:10 AM, BartC wrote:

... which Kaz' example does not. Repeat: Kaz' example does not
contain a */ inside the comment. Look more closely.

Yes I know it doesn't. And I gave a counter-example which looked like this:

/* char *comment_end = "*/"; */

(which also happens to match the variable more accurately, but that's
another matter...)
rest of the source file. If you take the second, then you must
disallow comments of the kind I illustrated above, because your
lexer will trip over the unmatched ' and/or " characters.

And as things are now, we must disallow comments like my example.

I can't see why one is any worse than the other.
 
S

Stephen Sprunk

C99 changed the syntax to permit a trailing comma in an enum
declaration. It would have made sense to permit a trailling
comma for *any* comma-separated list. (Probably not for the comma
operator, though.)

Why not? If ";" is a valid statement, then why shouldn't ",;" be valid?

S
 
E

Eric Sosman

Yes I know it doesn't. And I gave a counter-example which looked like this:

/* char *comment_end = "*/"; */

(which also happens to match the variable more accurately, but that's
another matter...)


And as things are now, we must disallow comments like my example.

I can't see why one is any worse than the other.

Would you like to be able to use a contraction like

if (p < endp) /* don't run off the end! */

in a comment? Then you don't want comments that nest.

Comments are not for suppressing blocks of code; comments are
for commentary.
 
E

Eric Sosman

Why not? If ";" is a valid statement, then why shouldn't ",;" be valid?

Do you think `*;' should be valid?

(If so, how many operands have been omitted? ;-)
 
K

Kaz Kylheku

Because the comma operator separates expressions, not statements.

That's true, but it does so by being a binary operator which
requires two operands.

The ; is not an operator, but a punctuator which terminates.
 
B

BartC

Eric Sosman said:
On 11/12/2011 8:57 AM, BartC wrote:

Would you like to be able to use a contraction like

if (p < endp) /* don't run off the end! */

(I still don't see what the problem would be with this example; what's
supposed to happen when that line in enclosed in another /*...*/ comment?)
in a comment? Then you don't want comments that nest.

Comments are not for suppressing blocks of code; comments are
for commentary.

Nevertheless that's what the OP wants to do. Does the Standard specifically
tell us what content is allowed in a comment, and what is frowned upon?
 
I

Ian Collins

That was before I realised that #if 0 didn't work as I expected, ie. ignore
everything except what was necessary to find a matching #endif.

Removing the code works as expected.
 
B

BartC

Robert Wessel said:
The problem in this case is that everything starting with the
apostrophe is the (start) of a character constant, thus if you've
tokenized that by the normal C rules, the */ will be lost as part of
the contents of the character constant.

OK. So the assumption is that a nested comment would tokenise it's contents.

However I don't think that's necessary. And wouldn't work anyway.
Yes, the second sentence of the following covers it:

"6.4.9 Comments
Except within a character constant, a string literal, or a comment,
the characters /* introduce a comment. The contents of such a comment
are examined only to identify multibyte characters and to find the
characters */ that terminate it."

In other words, anything is allowed, including source code, of C or anything
else.

So the suggestion that comments shouldn't contain source code is a style
issue that shouldn't be the concern of the Standard or imposed by a
compiler.

And reading in-between the lines, everything is allowed except */, even if
the latter was inside a literal, or inside a //-comment.

The only thing needed to make nested comments possible, is to disallow /* as
well as */, unless they are there to start or terminate a comment.

That means you won't be able to have /* as part of comment text, whether by
itself or part of a literal; is that really that big a deal?
 
B

BartC

Ian Collins said:
Removing the code works as expected.

That's rather a drastic measure. Especially if you've just downloaded some
source code and big chunks of it have been cut out! Rather difficult to
reinstate then. Anyway suppose the commented lines are actually intended to
be read?
 
K

Keith Thompson

BartC said:
(I still don't see what the problem would be with this example; what's
supposed to happen when that line in enclosed in another /*...*/ comment?)

Currently, /*...*/ comments don't nest. As a result, once the
compiler sees a /* sequence that's not inside another token (a string
literal or, rarely, a character constant), it only needs to scan
for a */ sequence, regardless of the context in which it appears.

With nesting comments, after seeing an opening /*, there are several
possible approaches:

1. Scan for /* and */ without tokenization and keep a count of the
nesting level. This causes problems if a comment contains a string
literal containing /* or */:

/* const char open_comment[] = "/*"; */

The */ at the end of the line terminates the inner comment, not the
outer one. Presumably the idea of allowing nested comments is to
make it possible to use /* and */ to comment out a chunk of code
that itself contains /*...*/ comments, but this example shows that
that still wouldn't work in all cases.

2. Tokenize everything inside /*...*/ comments, so that the compiler
can detect cases like the above and ignore comment delimiters in
commented-out string literals. But this comment from Eric's post:

if (p < endp) /* don't run off the end! */

doesn't contain a valid token sequence.

3. Define a new set of tokenization rules for use inside comments,
so that comment delimiters within string literals within comments
are ignored, but arbitrary text that doesn't necesssarily form a
valid token sequence is accepted. Note that '"' and '*/' are valid
character constants. Since the content of a comment may or may not
be some approximation of C source code, it's difficult to decide
whether a given sequence of characters that happens to include
double-quote charactes should be treated as a C string literal
or not.

And of course *any* of these changes would break existing code,
which means the odd of any of them being adopted are close to zero.
Nevertheless that's what the OP wants to do. Does the Standard specifically
tell us what content is allowed in a comment, and what is frowned upon?

Not directly, no. But it does clearly say that /*...*/ comments don't
nest, from which one can easily infer that you can't use /*...*/ to
comment out a block of code that already contains /*...*/ comments.

And if you want to comment out a block of code, you can always insert
a // at the beginning of each line. This makes it easier to see
at a glance which lines are commented out and avoids all the above
problems. (It does require you to have a compiler that recognizes //
comments, which is an issue if you want to use strict C90 or C95 mode.)
 
K

Keith Thompson

Willem said:
Keith Thompson wrote:
) If a comma were permitted after the last parameter declaration, you
) could just write:
)
) some_function(10 /* param1 */,
) 20 /* param2 */,
) #if 0
) 30 /* param3 */,
) #endif
) );

Which is why I have changed my preference to writing this:

some_function(10 /* param1 */
,20 /* param2 */
#if 0
,30 /* param3 */
#endif
);

And not only in C, but in almost any language.

The point is that this plays a lot nicer with revision control systems,
because adding or removing an argument is a 1-line change, whereas in the
other case it can potentially change the previous line as well.

And if you want to comment out the *first* argument?

I see your point, but personally I find that layout rather hard on
my eyes. The need to comment out a single parameter is rare enough
that I don't think it's worth it.
 
K

Keith Thompson

Acid Washed China Blue Jeans said:
[QUOTE="Kaz Kylheku said:
Suppose the code you want to comment out is the middle of a line, which is
already using /*...*/ ?

There is this:

C89:

#define IGN(X)

IGN( foo() );

void bar(IGN( const ) int *ptr)
{
}

Now, an unparenthesized comma doesn't work, e,g. IGN(A, B).

I just select the comment text and switch the font from Helvetica to Palatino.
That way I can embed whatever comments I want without all these distracting /*
*/ //.[/QUOTE]

That's fine if you have a compiler that accepts source code with font
information.
 
I

Ian Collins

That's rather a drastic measure. Especially if you've just downloaded some
source code and big chunks of it have been cut out!

But why would they have been cut out if they were required? Blocks of
code usually get commented out during development and this can be
managed by version control, or an IDE.
Rather difficult to
reinstate then. Anyway suppose the commented lines are actually intended to
be read?

They could be commented out with //
 
S

Seebs

(I still don't see what the problem would be with this example; what's
supposed to happen when that line in enclosed in another /*...*/ comment?)

Consider, if you will:

char *start_comment = "/*";

If comments don't nest, this is NEVER a problem. If comments nest, though,
this becomes a problem. The only way to get around that is to parse the
things inside a comment to see whether they really ought to be starting
a nested comment. But if we're parsing things inside comments:

/* don't run off the end! */

becomes a problem because it has a mismatched single quote.

Similarly, right now:

char *end_comment = "*/";

is fine, as long as you don't try to comment it out. Which is to say:

Comments should be text, not code, and comments are not the right tool
for cutting out bits of code. #if 0 is the right tool for cutting out bits
of code, because it works in terms of code. Comments used for code induce
serious problems.

-s
 
K

Keith Thompson

BartC said:
OK. So the assumption is that a nested comment would tokenise it's contents.

However I don't think that's necessary. And wouldn't work anyway.

[big snip]
That means you won't be able to have /* as part of comment text, whether by
itself or part of a literal; is that really that big a deal?

It would break existing code.

If you're advocating a change to the C language, the fact that it
would break existing code means such a change will not be adopted.

If you're talking about a C-like language in which comments nest,
then you're not talking about C.

Which is it?
 
K

Kaz Kylheku

In other words, anything is allowed, including source code, of C or anything
else.

No, because the formal language allowed within the comment excludes the
digraph */ which may occur in the source code of C, inside a string
literal, as part of a comment, or inside a // comment.
So the suggestion that comments shouldn't contain source code is a style
issue that shouldn't be the concern of the Standard or imposed by a
compiler.

It's more than a style issue; it's a formal language issue. Comments can only
contain a subset of the possible set of C strings.

It is useful that you can have a subset of C within comments, for instance,
coding examples. They just can't have /*...*/ comments of their own, which is
fine since they are already inside a comment which can explain everything.

A mechanism for commenting out arbitrary code must be robust, period. This way
it can be applied to a 50,000 line piece of source code without worry that
anything breaks.

#if 0 ... #endif satisfies that requirement.

For excluding small ranges of text not confined entire lines,
a variadic macro which expands to nothing works great:

#define IG(...)
IG(abitrary * preprocessor, ++t 0kens)

Therefore, it is no longer necessary to entertain proposals about nested
comments.
 
B

BartC

Keith Thompson said:
BartC said:
(I still don't see what the problem would be with this example; what's
supposed to happen when that line in enclosed in another /*...*/
comment?)

Currently, /*...*/ comments don't nest. As a result, once the
compiler sees a /* sequence that's not inside another token (a string
literal or, rarely, a character constant), it only needs to scan
for a */ sequence, regardless of the context in which it appears.

With nesting comments, after seeing an opening /*, there are several
possible approaches:

1. Scan for /* and */ without tokenization and keep a count of the
nesting level. This causes problems if a comment contains a string
literal containing /* or */:

/* const char open_comment[] = "/*"; */

Option (1) I think is the only viable one. And yes it does have a cost in
that spurious /* sequences are no longer allowed. But then /neither were
spurious */ sequences/!

What have people done in the past when */ was needed inside a /*...*/
comment? Whatever they did, perhaps they can do the same for /*!
2. Tokenize everything inside /*...*/ comments, so that the compiler ....
3. Define a new set of tokenization rules for use inside comments,
so that comment delimiters within string literals within comments
are ignored, but arbitrary text that doesn't necesssarily form a
valid token sequence is accepted.

Those won't work because comments should be allowed to be anything at all;
you shouldn't have to write comments to conform to a syntax! So they might
be in badly written and punctuated language, or be badly written, incorrect
and incomplete source code, or metadata for all sorts of purposes.

The only restriction is that */ can't be included, and, for nested comments,
neither can /*.
Note that '"' and '*/' are valid
character constants.

'*/' is valid by itself, but not inside /*...*/ comment, where it will
generate a syntax error.
And of course *any* of these changes would break existing code,
which means the odd of any of them being adopted are close to zero.

Possibly. Checking code for /* sequences inside comments doesn't sound
difficult. But once nested comments are in use, they won't be backwards
incompatible.
 
B

BartC

Keith Thompson said:
It would break existing code.

If you're advocating a change to the C language, the fact that it
would break existing code means such a change will not be adopted.

If you're talking about a C-like language in which comments nest,
then you're not talking about C.

Which is it?

Sorry, I thought this part of the thread was about hard or how practical it
would be to make /*...*/ nestable.

Like the OP I also think this is desirable (and at one time thought they
*were* nestable, then found it didn't work!)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,083
Messages
2,570,591
Members
47,212
Latest member
RobynWiley

Latest Threads

Top