Has thought been given given to a cleaned up C? Possibly called C+.

  • Thread starter Casey Hawthorne
  • Start date
K

Keith Thompson

Ben Bacarisse said:
Yes, that makes no sense. Fortunately for me that is not what I am
saying :)

Why do you think that a \r can't be there "in it's own right"? \r can
appear in a character literal or a string and C gives the sequence a
specified meaning. It does not have to map to a particular code and
it should not be taken to be part of line ending (though that last
rule was, if I recall correctly, sometimes broken in early C
implementations).

To take an extreme example: if stdout is a text stream, the call
puts("a\rb\r\n"); should generate one line on all systems. That line
should have four characters in it followed by whatever line ending the
system uses.

Not necessarily. It could easily generate two or three lines, or even
an invalid text file.

C99 7.19.2p2:

Data read in from a text stream will necessarily compare equal to
the data that were earlier written out to that stream only if:
the data consist only of printing characters and the control
characters horizontal tab and new-line; no new-line character
is immediately preceded by space characters; and the last
character is a new-line character. Whether space characters
that are written out immediately before a new-line character
appear when read in is implementation-defined.

If you write '\r' to a text stream, all bets are off. A system
*can* permit '\r' characters, or any arbitrary control characters,
to appear in text files with no special meanings (Unix-like systems
do this, for example). But it's not required. Rod's claim that it
makes no sense at all to have a '\r' in a text stream is overstated;
'\r' needn't be a line ending control character, but it can be.
 
B

bartc

Ben Bacarisse said:
Yes, that makes no sense. Fortunately for me that is not what I am
saying :)

Why do you think that a \r can't be there "in it's own right"? \r can
appear in a character literal or a string and C gives the sequence a
specified meaning.

Yes, carriage return. But why insert explicit carriage return into a text
stream that is supposed to full of converted "\n" characters? On my machine
carriage return is the code 13, which gives mixed results in your test
below.
It does not have to map to a particular code and
it should not be taken to be part of line ending (though that last
rule was, if I recall correctly, sometimes broken in early C
implementations).

To take an extreme example: if stdout is a text stream, the call
puts("a\rb\r\n"); should generate one line on all systems. That line
should have four characters in it followed by whatever line ending the
system uses.

Four characters and two line endings:

* When displayed on the console, the first \r causes the "a" to be
overwritten, so shows "b"
* Inside notepad, shows "ab" but with an invisible extra character between a
and b
* Inside "edit" (an ancient Dos editor) it shows a little music symbol after
a and after b
* Inside "wordpad" and "word", a and b are each shown on their own lines.

In other words, using "\r" is asking for trouble, unless one wants to
perhaps generate a Mac text file on a machine that is not a Mac.

In fact, this would be a good example of where a binary stdout is useful, if
you want to output explicit control codes (for example, if stdout is sent to
a device where every control character is significant).
bartc was describing a system that the solved a problem for him but it
sounded to me as if it would interfere with strings like the one above
that contain embedded \r sequences.

But like Rod said, an embedded \r is unlikely in normal Ascii text, if using
code 13, and not a problem if it isn't. I'd like to see a text file, perhaps
using LF for newlines, that had \r scattered through it. Then I'd like to
know Why...
Ideally, I would want to know which line end convention was used in
the source and I'd map that to the one required by the target. If the
source convention is not known then one can try to deduce it by using
heuristics.

Perhaps a _newline string constant should be in a header somewhere,
containing whatever newline sequence was in effect. That would help people
do their own thing if they wanted. That is, for systems where newline isn't
something crazy, like an entry in a database the other side of the file
system.
 
B

Ben Bacarisse

Keith Thompson said:
Not necessarily. It could easily generate two or three lines, or even
an invalid text file.

C99 7.19.2p2:

Data read in from a text stream will necessarily compare equal to
the data that were earlier written out to that stream only if:
the data consist only of printing characters and the control
characters horizontal tab and new-line; no new-line character
is immediately preceded by space characters; and the last
character is a new-line character. Whether space characters
that are written out immediately before a new-line character
appear when read in is implementation-defined.

I did not know that most control characters (the defined \ escapes, I
mean) were excluded like that. Thanks.
If you write '\r' to a text stream, all bets are off. A system
*can* permit '\r' characters, or any arbitrary control characters,
to appear in text files with no special meanings (Unix-like systems
do this, for example). But it's not required. Rod's claim that it
makes no sense at all to have a '\r' in a text stream is overstated;
'\r' needn't be a line ending control character, but it can be.

I'd put it a little stronger than that. There seems little point in
defining \r and it's meaning unless it is expected that they can be
used in what I might call the normal course of events. I agree it is
not /required/ (in a text stream) but I would say that the expectation
is that an implementation should interfere with the text stream only
when it has little choice.
 
B

Ben Bacarisse

bartc said:
Yes, carriage return. But why insert explicit carriage return into a text
stream that is supposed to full of converted "\n" characters? On my machine
carriage return is the code 13, which gives mixed results in your test
below.

I'll address the "why" below.
Four characters and two line endings:

* When displayed on the console, the first \r causes the "a" to be
overwritten, so shows "b"
* Inside notepad, shows "ab" but with an invisible extra character between a
and b
* Inside "edit" (an ancient Dos editor) it shows a little music symbol after
a and after b
* Inside "wordpad" and "word", a and b are each shown on their own
lines.

The definitive answer is how C sees it. If the output is read back
using fgets, how many characters (before the \n that fgets will
insert) are there and how many lines?

From Keith's reply, I accept that pretty much anything goes, but
common sense suggests that only a system that uses \r as an ending (on
its own) would choose to do anything special with it.

I've tried two implementations on Windows and they both read back 1
line with 4 characters before the \n. The file on disc, of course,
contains (in hex) 61 0d 62 0d 0d 0a.
In other words, using "\r" is asking for trouble, unless one wants to
perhaps generate a Mac text file on a machine that is not a Mac.

Yes, but the permission in the standard cuts both ways. It is
permissible for \r not to generate a line ending even on a legacy Mac
OS C implementation. \r could be mapped to some other (possible
useful) character because \n must be mapped to a Mac OS line ending.
I.e. my example might generate one line with 4 characters on it even
there. I accept this is unlikely -- as a practical matter, an old Mac
OS C implementation is probably going to assume \r is intended to end
a line even though this is a non-portable way to do so.
In fact, this would be a good example of where a binary stdout is useful, if
you want to output explicit control codes (for example, if stdout is sent to
a device where every control character is significant).


But like Rod said, an embedded \r is unlikely in normal Ascii text, if using
code 13, and not a problem if it isn't. I'd like to see a text file, perhaps
using LF for newlines, that had \r scattered through it. Then I'd like to
know Why...

C defines what it means in 5.2.2 p2: "Moves the active position to the
initial position of the current line". I have seen it used (and used
it myself) for this purpose; for example, to generate text that
overwrites previous text to display, say, a percentage progress. It
is far from ridiculous to use these few "standard" control characters
in text output despite it not being guaranteed to work on all C
implementations.

On a practical note, using \r in this way works on Unix-like and
DOS-like systems (that includes Windows) on the implementations that I
am aware of. It /could/ work even on an old Mac, but I accept that
Mac C IO is probably not written that way.
Perhaps a _newline string constant should be in a header somewhere,
containing whatever newline sequence was in effect. That would help people
do their own thing if they wanted. That is, for systems where newline isn't
something crazy, like an entry in a database the other side of the file
system.

The boat has already sailed on that one. C was standardised when
there was (a) a pressing need to fix this (the PC/workstation boom was
underway) and (b) record-oriented output was still around. OK, it
probably still is around, but it was far more common then.
 
N

Nick Keighley

KT> You can't even write a portable "hello world" program
KT> using binary mode.


Well, that wasn't my claim.  It was yours.  I made no claims about
_portable_ programs _without_ text mode, or even "hello world" programs for
that matter.  I said _binary_ was _well_suited_ for _text_.  And, it is.  It
usually works better IMO.  You'd want that clarified: for a specific
platform...

I don't see it. Why not use text mode for text files?

A) How do you write portably when 1) even portable C code is only so
portable,

"portable" isn't a predicate. There are degrees of portability. Many
basic Unix filters can be written in portable C.

2) you cannot test the code on other difficult systems anymore,

why not? If you program really never has to run a a weird architecture
(though I don't understand how you can tell) then don't test it for
them. I'd more argue against writing gratuitously non-portable code.

B) Why should you even attempt to support a non-x86 platform today?

ARM, DSP
x86 is the dominant platform.  Both the dominant x86 and modern non-x86
platforms use the same basic memory and I/O architectures that were in use
at C's creation.  If fact, almost all microprocessor based platforms back to
1974 do.

you never programmed on DOS?

Would C have been better off if it only had a single file mode?  I think so.

I don't
Except for boundary cases, like carriage return, linefeeds, and newlines,
the C code for text and non-text could would be of a single form or style..

what's a "boundary case". You specifically exclude most of the reasons
for using text mode (EOF is another reason) then say text mode isn't
necessary!
And, the boundary cases would be well known and well preserved in the code -
perhaps in simple #ifdef code sections...

 I'd guess that if that had been
done, we might've even seen a new standardized header containing all the
relevant info, perhaps as part of "C90".  That'd have been much better for
portability, don't you think?

without text mode you have to change your code between unix and
windows. Let along more exotic architectures.
 
R

Richard Bos

bartc said:
Perhaps a _newline string constant should be in a header somewhere,
containing whatever newline sequence was in effect.

For text streams, there already is one. It's spelled '\n'.
For binary streams, that's the wrong idea.
That is, for systems where newline isn't something crazy, like an entry
in a database the other side of the file system.

Or a preceding line length, which IIRC actually has been used. But
that's the problem: what _do_ you suggest we do for such systems? Ignore
them? Java can afford to do that; C cannot.

Richard
 
K

Keith Thompson

io_x said:
Keith Thompson said:
Making binary the default mode for stdout would be far worse.

what is the problem; what is the problem all files [stdout, stdin,
stderr too] are buffered and in binary?

Seriously?

The problem is that portable code would not be able to write valid
output to stdout. This program:

#include <stdio.h>
int main(void)
{
printf("hello, world\n");
return 0;
}

would write the string "hello, world" followed by a '\n' character;
that '\n' character would not necessarily represent a line ending.

Similarly, a program that reads from stdin will see some
system-specific end-of-line sequence at the end of each line of input
(CR, LF, CRLF, etc.).
 
K

Keith Thompson

Or a preceding line length, which IIRC actually has been used. But
that's the problem: what _do_ you suggest we do for such systems? Ignore
them? Java can afford to do that; C cannot.

A preceding line length would make some things easier, such as
reading a potentially very long line into memory. Just read the line
length first, allocate that much memory, and read that many bytes.

I wonder why the people who complain about '\0'-terminated strings
don't complain about '\n'-terminated lines.
 
N

Nick

io_x said:
Keith Thompson said:
Making binary the default mode for stdout would be far worse.

what is the problem; what is the problem all files [stdout, stdin, stderr too]
are buffered and in binary?

Well, at least, that to read a text file you have to know what to
expect, and (worse) to write one you have to know what to send.

Remember C runs perfectly well on systems that don't even /have/
end-of-line markers, never mind the endless confusion of every
combination under the sun of \n and \r.

LRECL=120 and all that.
 
R

Richard Bos

Flash Gordon said:
A "security" library proposal that include function that it recommends
you don't use? Seems plain stupid to me.

To begin with, yes.
Maybe they've improved a few, and on the next version they will improve
a few more (or remove the ones needing removal)?

Possibly. In which case we have two different "standards" for this
proposal, one slightly less broken but much less likely to be followed.
Is that better or worse? You tell me; I don't know whether being eaten
by Cthulhu is better or worse than being dismembered by Hastur.

Richard
 
R

Richard Delorme

Le 19/03/2010 18:10, Dag-Erling Smørgrav a écrit :
No, no, no, no and no. It would be a paradigm shift for C compilers.

Currently, a C compiler works more or less as follows:

1. The preprocessor expands macros, resolves conditionals and removes
comments.

2. The parser parses the output from the preprocessor and produces some
sort of intermediate representation.

3. The optimizer manipulates the intermediate representation to produce
more effective and / or smaller code.

4. The code generator translates the intermediate representation to
assembler code.

5. The assembler translates the assembler code to machine code and
produces a binary object that contains code, initialized data, and
information about symbols that are present in the object or that the
object references.

6. The linker combines the binary object(s) with various libraries and
produces an executable or a library.

No stage in this process has any knowledge about any other stage beyond
a simple understanding of what sort of output the previous stage
produces and what sort of input the next one expects, and sometimes not
even that (the optimizer, for instance, is invisible to the parser and
the code generator). Many toolchains implement step 1, steps 2 - 4,
step 5 and step 6 as four different, interchangeable programs which can
run independently of each other. Some also separate step 2 from steps 3
and 4.

What you propose turns everything upside-down. While the parser can
easily include the necessary information in its output, it has no way of
retrieving that information, unless you teach it everything that the
linker knows, and then some... it gets even worse for cross-compilers,
because the object format for the target might be completely different
from that used on the host. The target might not even *have* an object
format - on many embedded platforms, the linker outputs a memory image
which is written directly to the target's SRAM.

And what about preprocessor macros defined in library headers? They are
needed at the preprocessing stage, but the preprocessor doesn't even
know that what it's looking at is C (or C++ or ObjC or whatever, since
most implementations use a single preprocessor for all languages they
support), so how will it know where to look for them?

Everything is already upside down: Modern C compilers uses precompiled
header and does inter-procedural optimization during the linkage. My
proposal is simply to merge the precompiled header into the library, and
to automatically generate this precompiled header from the function
definitions, object definitions, etc.
 
D

Dag-Erling Smørgrav

Richard Delorme said:
Everything is already upside down: Modern C compilers uses precompiled
header and does inter-procedural optimization during the linkage.

No. Precompiled headers are not object files, they're in a format
developed for, and used exclusively by, the parser. Link-time
optimization operates on the output of previous stages, not the other
way around. Information still flows from the preprocessor through the
parser, code generator and assembler to the linker. Each stage still
operates either on the output of the stage that precedes it or on the
output from a previous instantiation of the same stage.

DES
 
R

Richard Bos

Keith Thompson said:
A preceding line length would make some things easier, such as
reading a potentially very long line into memory. Just read the line
length first, allocate that much memory, and read that many bytes.

True, which is why there were systems which did use such files; and
which is also why Bart's idea will not work on such systems.

Richard
 
T

Tim Rentsch

James Kuyper said:
Richard Delorme wrote: [snip]
- void can be removed from the language. So instead of declaring
void f(void);
we can simply write :
f();
The generic pointer type (void *), could then be replaced by (char*)
without much harm.

Using 'void' to indicate "no return type" or "no arguments" was bad
design, necessitated by issues of backwards compatibility. I have no
objection to a new language where you can express those things by the
absence of a specified return type or the absence of specified
arguments. [snip futher unrelated comments]

Using 'void' for no arguments -- sure, clearly that choice was
dictated more by backward compatibility concerns than by good
design guidelines.

Using 'void' to mean no return value -- good design, not bad.
Makes the language more uniform syntactically, more uniform
semantically, with fewer special cases, with result that is
smaller, simpler and cleaner. Leaving 'void' out in these
cases would be like those proofs from the ancient Greeks,
where the arguments had to be made twice, once if the
value in question was 1, again if the value in question
was not 1, because (according to their thinking) 1 wasn't
a number...
 
B

Ben Pfaff

Tim Rentsch said:
Using 'void' for no arguments -- sure, clearly that choice was
dictated more by backward compatibility concerns than by good
design guidelines.

Using 'void' to mean no return value -- good design, not bad.
Makes the language more uniform syntactically, more uniform
semantically, with fewer special cases, with result that is
smaller, simpler and cleaner.

But it also preserved backward compatibility, since pre-C89
versions of C assumed an "int" return type if none was
specified.
 
T

Tim Rentsch

Keith Thompson said:
Rod Pemberton said:
Keith Thompson said:
[...]
If I were
redesigning the switch statement from scratch, you'd be able to
specify multiple values in a single case, the "break" keyword
would not be required, and there would probably be special syntax
to specify falling through to the next case.


That works. It's not my first choice though. It just transposes the
locations where one must add additional control flow. E.g., "break;" is
removed, while, say, "fallthru;" is added. It's much like rewriting a
loop with "break's" to use "continue's" instead.

Except that "fallthrough;" (yes, I'd insist on spelling it correctly)
would be much rarer than "break;" is in current C code. Make the
normal case the default, and require a little extra work for the
exception. Of course it's too late to change this in C, unless we
leave the switch statement alone and add a new form of selection
statement (something I'm not advocating).
[...]
It also imposes an arbitrary ordering on the cases and restricts
which cases can fall through to which other cases.

What about "recase" or "reswitch"? E.g., perhaps like so:

case 0x10: recase (0x30);
case 0x20: /* stuff */
break;
case 0x30: /* stuff */
break;

The advantage is you don't need a goto label. The ordering can be as one
wishes. And, each case could then be auto-break. Hmm, that's not too bad,
IMO.

So "recase" is just like "goto", except that it jumps to a specified
case label rather than to a goto label.

Ok, that might not be an entirely bad idea. Let's explore it a bit.

I don't think the parentheses are necesssary: "recase 0x30;" should
suffice, though of course the expression can be parenthesized if you
like.

If we're really going to consider adding a 'recase' construct,
why not call a spade a spade, or more accurately call a goto
a goto:

case 0x10: goto 0x30;
case 0x20: /* stuff */ break;
case 0x30: /* stuff */ break;

The syntax for 'goto' would need to be expanded to take an
expression rather than just a label. If the target of a goto is
a label defined in the function (which would take precedence) or
an integer constant expression matching one of the 'case's in the
nearest-most enclosing switch block, the target is allowed, with
well-defined and easy to understand semantics.

Presumably "recase default;" would be permitted. Would a "recase"
statement that targets a nonexistent case label be a constraint
violation, or would it jump to the "default:" label if it exists,
or terminate the switch statement if it doesn't?

And 'goto default;' is easy to recognize, because 'default' is
reserved for use as a keyword. Again the nearest-most enclosing
switch would need an entry for this 'switch' choice.

What does it do in the presence of nested switch statements?
I'd guess that it would apply only to the innermost one, but
it would have to be specified.

Alternatively, for those folks who like less restrictions on
their control flow, 'goto <expression>;' or 'goto default;'
could accept any enclosing 'switch'-block choice as a destination.
Personally I like the more restrictive form, but hey, why make
folks have to work harder just because they want to get to
an outer 'switch'-block choice that already has a perfectly
usable label just sitting there, waiting to be transferred
to?

It still creates just as much opportunity for abuse as the goto
statement itself. If you want to write BASIC in C:

int line = 10;
switch (line) {
case 10:
puts("Infinite loop");
recase 20;
case 20:
recase 10;
}

The "recase" statement could be defined consistently, but I don't
believe the benefits would outweigh the cost.

Not even if we added a differentiating keyword, eg,
'goto static 20;'?


[snip additional material that offered no special opportunities
for silly humorous response comments like the ones made above]
 
L

lawrence.jones

Tim Rentsch said:
case 0x10: goto 0x30;
case 0x20: /* stuff */ break;
case 0x30: /* stuff */ break;

If we're going to enhance the syntax of goto, I think it would make more
sense to use ``goto case 0x30'' rather than a bare integer constant
expression.
 
P

Phil Carmody

If we're going to enhance the syntax of goto, I think it would make more
sense to use ``goto case 0x30'' rather than a bare integer constant
expression.

That was the first thing that went through my mind - the 'label' is
'case 0x30'. However, the analogy with labels as being something
to be used verbatim breaks down if you want to permit constructs like

int a, b; /* ... */
switch(a) {
case 0x10: goto case b;
case 0x20: /* x20 stuff */ break;
case 0x30: /* x30 stuff */ break;
}

It's entirely possible you want to avoid contructs like that, though.

Phil
 
W

Willem

Phil Carmody wrote:
) That was the first thing that went through my mind - the 'label' is
) 'case 0x30'. However, the analogy with labels as being something
) to be used verbatim breaks down if you want to permit constructs like
)
) int a, b; /* ... */
) switch(a) {
) case 0x10: goto case b;
) case 0x20: /* x20 stuff */ break;
) case 0x30: /* x30 stuff */ break;
) }

I think that would be better expressed as something like:
int a, b; /* ... */
switch(a) {
case 0x10: reswitch(b);
case 0x20: /* x20 stuff */ break;
case 0x30: /* x30 stuff */ break;
}

Where 'reswitch(...)' basically means 'do the switch again with ...'.

I think you could even just make it:

int a, b; /* ... */
switch(a) {
case 0x10: switch(b);
case 0x20: /* x20 stuff */ break;
case 0x30: /* x30 stuff */ break;
}

Which basically just means that if a 'switch' is not followed by a block,
then the case labels are taken from the parent switch-block.


) It's entirely possible you want to avoid contructs like that, though.

If all you want is to improve fallthrough techniques, yes.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
B

Ben Bacarisse

Willem said:
I think you could even just make it:

int a, b; /* ... */
switch(a) {
case 0x10: switch(b);
case 0x20: /* x20 stuff */ break;
case 0x30: /* x30 stuff */ break;
}

Which basically just means that if a 'switch' is not followed by a block,
then the case labels are taken from the parent switch-block.

That alters the meaning of a currently valid construct: switch (b);.
I was going to say "of existing code" but it is quite possible that
there has never been such switch inside another in any code base,
ever. Then again...

Technically, you'd want this meaning only when switch is followed by
;. switch can be followed by other non-block statements and you'd
want to leave them alone for the purposes of IOCC entries.

<snip>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,104
Messages
2,570,643
Members
47,246
Latest member
rangas

Latest Threads

Top