Software maintenance

N

Nick

Vincenzo Mercuri said:
From: Vincenzo Mercuri <[email protected]>
Subject: Re: Software maintenance
Newsgroups: comp.lang.c
Date: Thu, 16 Sep 2010 19:47:44 +0200



that reminds me of an example on "C Unleashed", something like:

func1(path); // in C:\DIR\
func2(path);

GCC tries - if asked to - to do something sensible about this:

$ cat test.c
func1(path); // in C:\DIR \
func2(path);
$ gcc -E test.c
# 1 "test.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "test.c"
func1(path);
$ gcc -Wall -E test.c
# 1 "test.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "test.c"
test.c:1:14: warning: multi-line comment
func1(path);

(the -E stops it going any futher and so avoids we having to create a
scaffolding around these two lines). You'll see that in both cases the
splicing still occurs. I think it's quite a good warning as well
(although I'd have added a mention of // in there) - if you want a
multi-line comments you ought to be using /* ... */
 
N

Nobody

If a comment ends with a backslash, it will swallow the next line.

A comment cannot end with a backslash.

A /*...*/ comment ends at the */, while a // comment ends at the end of
the LOGICAL line (i.e. the first EOL which isn't immediately preceded by a
backslash).

I can't see how you can realistically do away with line splicing, and
introducing rules which make splicing context-dependent would only make
the language more complex.
 
J

jacob navia

Le 19/09/10 18:50, Nobody a écrit :
A comment cannot end with a backslash.

A /*...*/ comment ends at the */, while a // comment ends at the end of
the LOGICAL line (i.e. the first EOL which isn't immediately preceded by a
backslash).

I can't see how you can realistically do away with line splicing, and
introducing rules which make splicing context-dependent would only make
the language more complex.

You can't see how?

Here is how:

Comments do not affect program behavior. Line splicing is done after
comment processing.

Isn't that a very simple rule?

Strings have an escape character \ . If an escape character precedes a
carriage return the carriage return is ignored and string contents
continue.

Isn't that a very simple rule?
 
F

Felix Palmen

* Vincenzo Mercuri said:
that reminds me of an example on "C Unleashed", something like:

func1(path); // in C:\DIR\
func2(path);

This example, together with the evolving discussion, really makes you
think about it. Maybe it shouldn't be consensus. After all, line breaks
in C do not carry any meaning, they're just whitespace. Of course, they
ARE meaningful to the preprocessor, but I wouldn't expect anyone writing
comments to be aware of writing "preprocessor code" right now ....

So you could conclude that one-line comments kind of violate a principle
of the language.

Regards,
Felix
 
R

Ralf Damaschke

jacob said:
Comments do not affect program behavior. Line splicing is done after
comment processing.

Isn't that a very simple rule?

I suspect only for those who do translation phase 3 before phase 2.
Strings have an escape character \ . If an escape character precedes a
carriage return the carriage return is ignored and string contents
continue.

Isn't that a very simple rule?

I cannot see how mixing phase 2 steps with phase 5 makes anything
simpler.

-- Ralf
 
J

jacob navia

Le 19/09/10 21:21, Ralf Damaschke a écrit :
I suspect only for those who do translation phase 3 before phase 2.

This "translations phases" model is wrong. Lines can be put together
only after the comments have been stripped out
I cannot see how mixing phase 2 steps with phase 5 makes anything
simpler.

Simpler for the user of course, not for the compiler writer.
Instead of doing the line splicing before comments are thrown out, it
has to be done after, what is a very minor change actually.
 
K

Keith Thompson

jacob navia said:
Le 19/09/10 21:21, Ralf Damaschke a écrit :

In your proposed model, is line splicing done before or after
tokenization? If after, it becomes impossible for line splicing to
create new tokens, such as:

print\
tf("hello, world");

If before, then you have to separate tokenization and comment
recognition.

I don't suggest that it's good practice to splice tokens this way,
but it is an existing feature of the language. Are you willing to
break existing code?
This "translations phases" model is wrong. Lines can be put together
only after the comments have been stripped out

So what would you replace the translations phases model with?
We need *some* way to describe how source files are translated.
The current model does so with little or no ambiguity.

[...]
Simpler for the user of course, not for the compiler writer.
Instead of doing the line splicing before comments are thrown out, it
has to be done after, what is a very minor change actually.

I'm skeptical that it's as minor as you say it is. I'm prepared to
be convinced otherwise.
 
K

Keith Thompson

Keith Thompson said:
In your proposed model, is line splicing done before or after
tokenization? If after, it becomes impossible for line splicing to
create new tokens, such as:

print\
tf("hello, world");

Make that

prin\
tf("hello, world");
 
J

jacob navia

Le 20/09/10 02:18, Keith Thompson a écrit :
In your proposed model, is line splicing done before or after
tokenization? If after, it becomes impossible for line splicing to
create new tokens, such as:

print\
tf("hello, world");

If before, then you have to separate tokenization and comment
recognition.

I don't suggest that it's good practice to splice tokens this way,
but it is an existing feature of the language. Are you willing to
break existing code?


In 20+ years programming in C I have never seen any program that uses
this "feature".

If used it is a horrible style and should be dropped.
So what would you replace the translations phases model with?


Nothing, it should stay as it is. The difference is that line splicing
should be done after comments are taken out. That is all.

Line splicing should be done after tokenization, what implies comments
filtering.

We need *some* way to describe how source files are translated.
The current model does so with little or no ambiguity.

Yes. What I am saying is that a small modification of the current model
is necessary to preserve the rule:

"Comments do not affect programbehavior"
[...]
Simpler for the user of course, not for the compiler writer.
Instead of doing the line splicing before comments are thrown out, it
has to be done after, what is a very minor change actually.

I'm skeptical that it's as minor as you say it is. I'm prepared to
be convinced otherwise.
 
J

jacob navia

Le 20/09/10 02:36, Keith Thompson a écrit :
Make that

prin\
tf("hello, world");

You see?

The fact that a bug creeps in in just a simple example should convince
you that this is just a misfeature.
 
S

Seebs

In 20+ years programming in C I have never seen any program that uses
this "feature".

I have. Probably once or twice.
If used it is a horrible style and should be dropped.

Machine-generated code under circumstances where there is a technical
reason for a line length limit at some point in the process. Not a matter
of style.

-s
 
N

Nobody

This "translations phases" model is wrong. Lines can be put together
only after the comments have been stripped out

Which means that you can't split comments; which means that you have to be
able to recognise comments if you want to split lines.

The benefit of line-splicing being done at such an early stage is that you
can take a 132-column source file and convert it to an equivalent
80-column source file with little more than a one-line sed script (the
only gotcha is that you can't insert a backslash-newline into the middle
of a trigraph). Using trigraphs to convert a US-ASCII source file to one
using the ISO-646 invariant subset is similarly straightforward.

Essentially, the first two phases deal with file encoding issues, not
language issues per se. In the same way that the first six phases can
easily be moved into a separate pre-processor (and often are), the first
two phases could be moved into a pre-pre-processor.
 
J

jacob navia

Le 21/09/10 13:32, Nobody a écrit :
Which means that you can't split comments; which means that you have to be
able to recognise comments if you want to split lines.

Yes. Why would you want to split comments?
The benefit of line-splicing being done at such an early stage is that you
can take a 132-column source file and convert it to an equivalent
80-column source file with little more than a one-line sed script (the
only gotcha is that you can't insert a backslash-newline into the middle
of a trigraph). Using trigraphs to convert a US-ASCII source file to one
using the ISO-646 invariant subset is similarly straightforward.


Incredibly useful applications. :)

Seriously, you would have to write a slightly more complicated software
that probably doesn't fit in a one line sed script.

So what?

More important is to avoid bugs in programs that assume that comments
are ignored.
Essentially, the first two phases deal with file encoding issues, not
language issues per se. In the same way that the first six phases can
easily be moved into a separate pre-processor (and often are), the first
two phases could be moved into a pre-pre-processor.

Most preprocessors now include all that, and some compilers preprocess
and compile at the same time to avoid i/o... Those "phases" have no
practical sigificance any more
 
K

Keith Thompson

jacob navia said:
Most preprocessors now include all that, and some compilers preprocess
and compile at the same time to avoid i/o... Those "phases" have no
practical sigificance any more

The practical significance of the translation phase model is that
it unambiguously defines the result of translating a source file.
The standard explicitly acknowledges that implementations often
fold some phases together. The point is that, however they're
implemented, the final result *must* be as if the phases were
separate.

I'm not sure whether you want to get rid of the translation phase
model or not. But if you do, what exactly would you replace it with?
 
J

jacob navia

Le 21/09/10 18:42, Keith Thompson a écrit :
The practical significance of the translation phase model is that
it unambiguously defines the result of translating a source file.
The standard explicitly acknowledges that implementations often
fold some phases together. The point is that, however they're
implemented, the final result *must* be as if the phases were
separate.

I'm not sure whether you want to get rid of the translation phase
model or not. But if you do, what exactly would you replace it with?
I repeat:

The model needs just a very small change: line splicing is done after
comment processing. Everything else stays the same
 
K

Keith Thompson

jacob navia said:
Le 21/09/10 18:42, Keith Thompson a écrit :
I repeat:

The model needs just a very small change: line splicing is done after
comment processing. Everything else stays the same

Ok.

At least twice in this thread, you've made comments that suggest that
you think the translation phase model as a whole should be discarded.
In this very discussion, you wrote that

Those "phases" have no practical sigificance any more

In fact, compilers have *always* combined multiple phases; I doubt
that there's ever been a C compiler that actually had a distinct
program for each phase. When you say the phases "have no practical
sigificance any more", it sounds like you're disagreeing with
something, but I don't know what.

Are you simply proposing that phases 2 and 3 should be swapped? If so,
note that the tokenization phase would now be operating on physical
source lines, not logical source lines.

Wouldn't you want to allow string literals to be spliced?

printf("hello\
, world\n");

If so, you have to have two separate places where backslashes are
used for splicing: in phase 3 (formerly phase 2), and in the syntax
of a string literal. How would either or both deal with trailing
whitespace after the backslash? That's currently a nasty issue
in some cases:

foo(); \
bar(); \
baz();

The first line has the backslash at the very end of the line, so
it is spliced. The second has a space following the backslash,
and therefore is not spliced (it's a syntax error).

What about a backslash followed by a comment?

x = 42; \ /* comment */

Each comment is replaced by a single space in the tokenization phase.
The line-splicing phase would then see a backslash followed only
by whitespace. If whitespace following a comment is ignored,
the line is spliced (which could be surprising).

I suggest that there's really no such thing as a "very small change"
when you're trying to modify the translation phase model.
 
S

Seebs

Yes. Why would you want to split comments?

Because you might be splitting ALL lines of code, and you don't want
to have to parse them to see whether or not they're comments.

I have never seen \ at end of line used by anything but mechanical
code processors.
Seriously, you would have to write a slightly more complicated software
that probably doesn't fit in a one line sed script.

And since existing installations are using trivial conversions to handle
line length limits in C implementations, we're stuck with those existing.
More important is to avoid bugs in programs that assume that comments
are ignored.

Is there actually any such bug in the real world? I mean, outside of
completely contrived test programs, has there EVER been a program which
was actually broken because someone had a "//" comment which was
unintentionally continued by a \ at the end of the line?

I've never seen one.
Most preprocessors now include all that, and some compilers preprocess
and compile at the same time to avoid i/o... Those "phases" have no
practical sigificance any more

I think they do, in that they define what the output should be in boundary
cases.

-s
 
S

Seebs

The model needs just a very small change: line splicing is done after
comment processing. Everything else stays the same

Can you point to the set of real-world programs you believe are affected,
in any way, by this change? Which of them are fixed and which are broken?

-s
 
K

Keith Thompson

Seebs said:
Because you might be splitting ALL lines of code, and you don't want
to have to parse them to see whether or not they're comments.

I have never seen \ at end of line used by anything but mechanical
code processors.

I think you're forgetting about multi-line macro definitions.

[...]
 
S

Seebs

I think you're forgetting about multi-line macro definitions.

You're right, obviously I am.

And I've even seen comments used in them. Although I think always
/**/ comments.

-s
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,954
Messages
2,570,114
Members
46,702
Latest member
VernitaGow

Latest Threads

Top