Has thought been given given to a cleaned up C? Possibly called C+.

  • Thread starter Casey Hawthorne
  • Start date
K

Keith Thompson

Eric Sosman said:
Eric said:
On 3/10/2010 9:14 AM, bartc wrote:
bartc wrote:
[...]
Otherwise, in the case of using bool: surely there are tools that will
refactor code so that problem names (that are now keywords for example) are
fixed, by substituting an automatic or user-supplied alternative (or by
using the C# idea of using @name I think it was, to allow @name as an
identifier, when name is a reserved word.)

"Existing code is important, existing implementation are not."
-- Rationale

Do you have any idea of the sheer volume of C source code that's
out there? I don't: All I can say is that it's in excess of a
hundred million lines -- I know of one single product suite that
comes to a quarter that amount all by itself. So, try this thought
experiment: Imagine yourself trying to tell someone that he's got
to take all that code and run it through an untested (unwritten!)
tool and cope with the inevitable glitches. (Success rate of
99.99% means more than a hundred thousand problems.) Is it more
likely that you'll persuade the owner of the code to spend money
fixing a hundred thousand brand-new bugs, or that you'll be thrown
out on your ear?
[...]

And even assuming 100% reliability, changing the source is only the
start of your problems.
 
J

jacob navia

Rod Pemberton a écrit :
There is much code that doesn't use them and has functions with identical
names, etc. E.g., I frequently need one or two string.h function(s). So, I
implement them with the same name and skip including string.h. You need
stdio.h for file I/O. But, who really needs ctype.h? stdlib.h? stdbool.h?
You can write equivalent code without them.

You are missing the point

If I include <stdlib.h> and use size_t, my code will run correctly
in 32 and 64 bits. If I do

#define size_t int

my code will not run in a 64 bit system. Using header files makes
your code portable.
Why? Learn hexadecimal. No joke. Each letter corresponds to 4-bits.

The lcc-win compiler accepts binary constants with the "b" suffix.
No need for sizeof().

Sure.

GREAT!

ANd how do you know how big a struture is, to allocate storage
for it?

Well, you just count the fields, then you make some educated guesses for your
particular compiler and optimizationb options and you write a constant in the
code. When you change compiler or compiler options you just recalculate.

Very easy and fun.

No need for sizeof.

[snip the rest of nonsense]
 
B

bartc

Eric Sosman said:
Sorry; you lost me there. How can the compiler know that
`c' points to the start of a string, rather than to a single
one-off isolated `char' object? That is, how does the compiler
know whether "%s" or "%c" or "%d" (or "%u") is appropriate?

The compiler can deduce %s as the most likely format, since you wouldn't
normally want to display a pointer to a char array as an integer; if you
did, you can just directly use "%d" or "%u", or use a cast: (int)c.

"%c" can also be discounted: since as you say it can't tell whether a char*
c points to a string or a char (or some other sequence), it might as well
assume %s. To get a single character, just dereference: *c, or use overrides
(there are other ways but they involve introducing a dedicated string type).

Suggested default formats might be:

int "%d" Varying according to width and signedness
double "%f"
char "%c" A bit controversial this one
char* "%s" Special case for char arrays
T* "%p" All other pointers
You've got it backwards. C streams have text and binary
modes for the benefit of systems where they *aren't* the same.
It is not C's business to legislate the file formats used by
its host platforms.

Nevertheless, if on your platform, text and binary modes are the same, then
you will see fewer problems.
Right. The latter. End of discussion.

The discussion was about a cleaned up C.
 
E

Eric Sosman

Wait -- I botched that one, didn't I? Oh, well: After I've
scraped the egg off my face, I'll make an omelet or something.
The compiler can deduce %s as the most likely format, since you wouldn't
normally want to display a pointer to a char array as an integer; if you
did, you can just directly use "%d" or "%u", or use a cast: (int)c.

Having botched one argument against, let me proceed to
another:

log_message("error %?: %?\n", errno, strerror(errno));

Keep in mind that log_message() is a function defined by the
user, not a function already known to the compiler -- so how
does the magic happen? You could invent additional syntax so
the user could describe log_message() and invoke the spells
(see gcc's "printf-like" and "scanf-like" stuff), but it seems
to me you've been agitating for simpler rather than more involved
declarations. Or you could just say "Magic only works with the
Standard library" (precedent: <tgmath.h>), but this means a user
can no longer write his own wrappers for printf() et al. and
use the same format strings.
Nevertheless, if on your platform, text and binary modes are the same,
then you will see fewer problems.

True: On platforms where the modes aren't different, even
the wrong one will work and mistakes will go unnoticed. But
the point is that this happy state of affairs does not prevail
on all platforms that host C implementations; many of them *do*
distinguish text files from binary files, and need to treat them
differently. If you want the distinction removed from C, you're
saying that some of the platform's files will be C-inaccessible.

Personally, the only problems I've ever run into with text
vs. binary streams in C have their roots in a botched transfer
between platforms with dissimilar conventions, leading to a
"text file" residing on Platform A that follows Platform B's
conventions instead of those that are native. But simply waving
the "use binary stream" magic wand at the problem is no solution:
It just means that the *program* needs to be aware of all the
different conventions remotely-sourced files might use.
The discussion was about a cleaned up C.

... and my point is that the implementation *needs* to do
non-portable things to carry out its job. If you forbid the
implementation from using non-portable techniques in its own
constituent parts, you cannot even implement exit().

In any event, it seems to me that you are not describing
anything that could be called "cleaned up C." You may be
describing a programming language (at the cocktail-napkin
stage), but you're not describing anything remotely like C.
 
J

jacob navia

Jasen Betts a écrit :
where are they found?

--- news://freenews.netfront.net/ - complaints: (e-mail address removed) ---
I will publish them soon. I have published some code in this group.
 
B

bartc

log_message("error %?: %?\n", errno, strerror(errno));

Keep in mind that log_message() is a function defined by the
user, not a function already known to the compiler -- so how
does the magic happen? You could invent additional syntax so
the user could describe log_message() and invoke the spells
(see gcc's "printf-like" and "scanf-like" stuff), but it seems
to me you've been agitating for simpler rather than more involved
declarations. Or you could just say "Magic only works with the
Standard library" (precedent: <tgmath.h>), but this means a user
can no longer write his own wrappers for printf() et al. and
use the same format strings.

Presumably log_message() at some point calls printf-like functions. At that
point, it will see a runtime format string, and I suggested a couple of
ways, earlier, of dealing with these.

It seems the "error %?: %?\n" parameter only tells the function how to lay
out the values; the exact format codes (%d and %s perhaps) are not really
relevant; they are just there because that's how C works at present.

(The following is some code from a language which is not C, but is similarly
low-level:
int a = 98
real b = 1.71
ref char c = "Bart"
char d = 'C'

println "ABCD =",a,b,c,d

printf("ABCD = %d %f %s %c\n",a,b,c,d) # Call C's printf()

Actual output:

ABCD = 98 1.710000BartC
ABCD = 98 1.710000 Bart C

The format string makes the spacing easier to control, but apart from that
the built-in print and C's printf() have the same output. The built-in print
however requires no format codes; magic! In C extra maintenance is needed to
keep the formats matched up with the parameters.

I'm not suggesting format strings should be done away with -- they are jolly
useful -- just that the compiler can work a bit harder for it's money.)
 
B

Ben Bacarisse

bartc said:
In this example, using my suggestion of "%?":

int a;
double b;
char *c;

printf ("A,B,C = %? %? %?",a,b,c);

The compiler knows the format string, and can change those "%? %? %?" to "%d
%f %s".

A small point: Any design that uses a format with inserted data like
this should at least consider using numbers rather than position to
indicate the data to be inserted. I.e. you write (or you have the
option to write) something like "A,B,C = %1 %2 %3".

It makes no difference in this case (hence it could be optional) but
it can be very useful when translating messages between languages that
require different word orders to convey the same idea.

<snip>
 
S

Seebs

A small point: Any design that uses a format with inserted data like
this should at least consider using numbers rather than position to
indicate the data to be inserted. I.e. you write (or you have the
option to write) something like "A,B,C = %1 %2 %3".

There's an option for this in common C variants:

printf("%$2d %$1d\n", 1, 2); => 2 1

-s
 
I

Ian Collins

Sorry; you lost me there. How can the compiler know that
`c' points to the start of a string, rather than to a single
one-off isolated `char' object? That is, how does the compiler
know whether "%s" or "%c" or "%d" (or "%u") is appropriate?

From the types. C++ compilers do this already:

int a;
double b;
char *c;

std::cout << "A,B,C = " << a << ' ' << b << ' '<< c;

printf ("A,B,C = %? %? %?",a,b,c);
 
B

Ben Bacarisse

Seebs said:
There's an option for this in common C variants:

printf("%$2d %$1d\n", 1, 2); => 2 1

Yes, gcc/glibc does this, but the trouble with it is that the type is
tied to the number. It works in a lot of common situations but it is
less useful than it would be in a context where the type did not need
to be specified.
 
R

Rod Pemberton

bartc said:
I don't quite get this. You mean whether or not those empty case statements
above would each have a break rather than falling through?

Yes. And, each would have to duplicate "stuff".

If you pass hash values to a switch, you could see many like this. A large
(C99) switch of mine has 353 labels, with about 250 labels, as
non-sequential 32-bit values, executing the same block.
I've never thought of it like that. However, with the following suggestion,
[suggestion of multiple values and ranges in case label]
those cases would all be properly grouped together, and the problem doesn't

True.
You are aware that C has both a structured switch() and an
unstructured switch(), yes? The unstructured switch() doesn't use
block scope. You can't nest these. The case labels are interspersed
into other code, typically including other control structures.

Are you sure?
Yes.

The following seems to nest (and prints TRUE):

int a,b,c;

a=13;

switch(a)
case 9: case 10: case 11:
if (0)
case 13:
puts("TRUE");

Well, that's not nested switches, but a single unstructured switch. Try
adding another switch, e.g., switch(b) and/or another case, say "case 13:"
close to that "case 13:", etc. without creating a block. BTW, the
unstructured switch also can't use the "default" label.


Rod Pemberton
 
R

Rod Pemberton

jacob navia said:
Rod Pemberton a écrit :

Sure.

GREAT!

ANd how do you know how big a struture is, to allocate storage
for it?

Well, you just count the fields, then you make some educated guesses for your
particular compiler and optimizationb options and you write a constant in the
code. When you change compiler or compiler options you just recalculate.

Very easy and fun.

No need for sizeof.

Do the typical macro definitions of offsetof() (e.g., C Rationale or X11?)
work correctly with LCC-Win32? That strongly suggests to me they don't.

If they do, then there is no need for sizeof(). Only slight adjustment of
an existing struct is required. Structs can also be constructed to
determine the size of other types. If one is paranoid about incorrect value
due to alignment issues, then one could do multiple checks.


Rod Pemberton
 
R

Rod Pemberton

Keith Thompson said:
Why on Earth would you do that?

Portability. Completeness. Simplification. Optimization... Not all C
compilers, esp. older or C-subset, have complete libraries. If you're
working in a new environment, you might only be able to bootstrap an older C
or a C-subset compiler.
Is there something wrong with using
the standard library?

Nope. That's what they're there for, when they're there.
Fallthrough from a non-empty case is relatively rare.

I guess no one else has ever passed values from a hash function into a
switch?
If I were
redesigning the switch statement from scratch, you'd be able to
specify multiple values in a single case, the "break" keyword
would not be required, and there would probably be special syntax
to specify falling through to the next case.

That works. It's not my first choice though. It just transposes the
locations where one must add additional control flow. E.g., "break;" is
removed, while, say, "fallthru;" is added. It's much like rewriting a loop
with "break's" to use "continue's" instead.
On the other hand, I'm not 100% sure I'd even want to support
fallthrough from one case to the next; it's a bit of a hack, and
it makes the code difficult to maintain.

I think it's much the same issue as "bartc" suggested with C needing
multi-level breaks. A programmer needs or wants ways to escape from the
current control-flow. Is this because they nested too deeply? Or,
structured the code rather poorly? Or, never learned structure programming?
Does it matter to them? It's much like when C programmers started using
"goto" to execute errory recovery within the same function. There's no need
to ever use a "goto", IMO, if one knows structured programming.
Fall-through and status flags would allow them to exit the current
control-flow and execute the error recovery code. Although, a goto may
result in a speed improvement. But, "goto's" complicate the control-flow,
making the code harder to restructure.
The enum feature has existed in its current form since before
K&R.

There are many old C and C-subset compilers that don't implement them.
Unless you need to handle text, of course.

Even if you handle text...
And no, manually handling
the '\r' and '\n' characters is not a good solution, especially if
your code might ever be ported to a system with a different text file
format.

If that's (non-ASCII charset support of text files) really needed, you can
write them ('\r' and '\n') out as text, read them back in as binary, emit
the read-in binary sequence instead of escapes. Portable... ASCII is still
dominant. The EBCDIC coders can fix up the common ASCII issues. They're
aware of the problems of C coded with ASCII. Of course, it's really not
wise to use huge charsets for character representation in source code...

It's interesting that you didn't mention ctype.h functions. There is *much*
code that implements their own ASCII based ctype.h functions, e.g., ASCII
based isapha(), if ch>='a' || ch<='z' etc... Yet, a simple #define and use
of a string.h function (e.g., strchr) could've made the isalpha() etc. code
portable.
Believe it or not, I recently ran into *old* code that used "static",
"double", and "register", as variables...

How old was it? "static", "double", and "register"
have been keywords in C for a *very* long time; see [Cman.pdf]

1980 or so. Yes, that's afterwards, but not that long afterwards (assuming
1974).


Rod Pemberton
 
N

Nick

Rod Pemberton said:
I guess no one else has ever passed values from a hash function into a
switch?

It's still relatively rare. Out of curiosity, why don't you use a
perfect hash here - since the set of values and results must be known at
programming time?
That works. It's not my first choice though. It just transposes the
locations where one must add additional control flow. E.g., "break;" is
removed, while, say, "fallthru;" is added. It's much like rewriting a loop
with "break's" to use "continue's" instead.

It certainly is in:

case 1:
do_stuff();
case 2:
do_more_stuff();
break;

And most of the time when you do this, sooner of later you find yourself
in a mess because you want a third case that also falls through the same
place. Short of goto, or wrapping the whole thing in a loop and
changing the control variable there's no way to do this.

But there is a need for doing the same thing with multiple cases. I'd
agree that this is rare enough (Rod has a very good case (sorry!) for
it to be the unusual one, and the ability to put multiple values on one
case would do it for me.

Too late now of course. Unless by introducing an entirely new "select"
or similar that works like switch except for the different syntax.
I think it's much the same issue as "bartc" suggested with C needing
multi-level breaks. A programmer needs or wants ways to escape from the
current control-flow. Is this because they nested too deeply? Or,
structured the code rather poorly? Or, never learned structure programming?
Does it matter to them? It's much like when C programmers started using
"goto" to execute errory recovery within the same function. There's no need
to ever use a "goto", IMO, if one knows structured programming.
Fall-through and status flags would allow them to exit the current
control-flow and execute the error recovery code. Although, a goto may
result in a speed improvement. But, "goto's" complicate the control-flow,
making the code harder to restructure.

I remain unconvinced that adding a pile of status variables is in any
way (apart from adhering to a set of principles) an improvement over
some sensible constructs.

Finding a single item in a multi-dimensional array is a good example -
I'd always write that as a (static) function and return on success.
I've seen the fixes with "&& flag" stuffed into control statements, and
the worse cobbles with a single loop and piles of multiplication and
division and they strike me as a remedy worse than the disease.
 
E

Ersek, Laszlo

Some typos crept in. The decimal string giving the argument position
comes between the percent and dollar signs:

printf("%2$d %1$d\n", 1, 2); => 2 1
Yes, gcc/glibc does this

More precisely, it is mandated at least by the SUS, versions 1 through
4. The relevant sections are shaded/marked as various extensions across
the four versions, but all versions require the feature to be present on
XSI-conformant systems. The only version that enables a system to
conform to it without supporting the %n$ forms is v3 -- a system can
conform to "IEEE Std 1003.1, 2004 Edition" (POSIX:2004) without
supporting the XSI option group.

lacos
 
B

bartc

Ben Bacarisse said:
A small point: Any design that uses a format with inserted data like
this should at least consider using numbers rather than position to
indicate the data to be inserted. I.e. you write (or you have the
option to write) something like "A,B,C = %1 %2 %3".

Yes I'm sure I've seen that somewhere.

But the syntax needs to be worked out so you know where the number ends
("%12 could be param #1 followed by "2", or param #12).

In Seebs' example, he used "%$2d", which is still a little ambiguous (where
does the field width go, before the $?).
It makes no difference in this case (hence it could be optional) but
it can be very useful when translating messages between languages that
require different word orders to convey the same idea.

I'm not completely convinced. Language 1 might use:

printf("%$1d %$2d", a, b);

and language 2 might use:

printf("%$2d %$1d", a, b);

You still need two lots of format strings, and they are much more fiddly to
write. Perhaps a scheme with macros or string tables can be used to pick out
a format string to be used with a single printf().

(And from my experience of multi-language applications, this kind of problem
was rare.)
 
J

jacob navia

Rod Pemberton a écrit :
Do the typical macro definitions of offsetof() (e.g., C Rationale or X11?)
work correctly with LCC-Win32?

Yes

That strongly suggests to me they don't.
Your logic is very clear:

If I doubt how can you replace sizeof that means that I have a bug in
offsetf. GREAT!

The offsetf macro results in the offset to the START of a field, not to its
end. You can get then only the size of all the fields EXCEPT the last one.

If they do, then there is no need for sizeof(). Only slight adjustment of
an existing struct is required.

???

I have to add a field (wasting real space in memory) to know the size
of the structure?

i.e. if I have

struct b { int a; double b; long double c[3]; };

I have to modify it like this:

struct b { int a; double b; long double c[3]; int lastfield;};

so that I can do an offsetof to lastfield so that I know the size. But doing it
like this I am wasting 4 bytes (or 8 in 64 bits) because of the alignment
problems.

Your propsal makes absolutely NO SENSE.

Structs can also be constructed to
determine the size of other types.

And why would we want to do all this extra work?
Why can't we use sizeof and be done with this?

If one is paranoid about incorrect value
due to alignment issues, then one could do multiple checks.

Sure, but why wouldn't be better just to use sizeof?

Why all this extra work? What is the problem with sizeof?
 
N

Nick Keighley

It works.  It allows the *creation* of _many_ types from a few very simple
syntax elements.  I'm not sure I can say that about any of the other
languages I've experienced.

Pascal, Ada etc.
 
S

Seebs

Some typos crept in. The decimal string giving the argument position
comes between the percent and dollar signs:
printf("%2$d %1$d\n", 1, 2); => 2 1

That's not a typo, just bad memory. I never used the feature, and it
seemed "obvious" to me that the right way to do this would be to have
it be a modifier, like ".". (Because "%12..." would, in every other
circumstance, be specifying a width of 12, and precision is indicated
by a *leading* punctuation mark. I just assumed that other extensions
would also use a leading punctuation mark.)

I don't know if I've seen this yet on any BSD systems, but it appears
to be available in Linux these days.

-s
 
E

Ersek, Laszlo

That's not a typo, just bad memory. I never used the feature, and it
seemed "obvious" to me that the right way to do this would be to have
it be a modifier, like ".". (Because "%12..." would, in every other
circumstance, be specifying a width of 12, and precision is indicated
by a *leading* punctuation mark. I just assumed that other extensions
would also use a leading punctuation mark.)


FWIW, from SUSv2/fprintf():

----v----
In format strings containing the %n$ form of a conversion specification,
a field width or precision may be indicated by the sequence *m$, where m
is a decimal integer in the range [1, {NL_ARGMAX}] giving the position
in the argument list (after the format argument) of an integer argument
containing the field width or precision, for example:

printf("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec);

The format can contain either numbered argument specifications (that is,
%n$ and *m$), or unnumbered argument specifications (that is, % and *),
but normally not both. The only exception to this is that %% can be
mixed with the %n$ form. The results of mixing numbered and unnumbered
argument specifications in a format string are undefined. When numbered
argument specifications are used, specifying the Nth argument requires
that all the leading arguments, from the first to the (N-1)th, are
specified in the format string.
----^----

I don't know if I've seen this yet on any BSD systems, but it appears
to be available in Linux these days.

Anecdote time! From lbzip2-0.17 (obsolete -- current version is 0.23),

----v----
/*
I have to replace this elegant construct with the lame one below, because
the Tru64 system I tested on chokes on the %N$*M$lu conversion
specification, even though it is certified UNIX 98 (I believe):

$ uname -s -r -v -m
OSF1 V5.1 2650 alpha
$ c89 -V
Compaq C V6.5-011 on Compaq Tru64 UNIX V5.1B (Rev. 2650)
Compiler Driver V6.5-003 (sys) cc Driver

http://www.opengroup.org/openbrand/register/brand2700.htm

And no, I won't factor this out.
28-JAN-2009 lacos
*/

#if 0
"%1$s: any worker tried to consume from splitter: %3$*2$lu\n"
"%1$s: any worker stalled : %4$*2$lu\n"
"%1$s: muxer tried to consume from workers : %5$*2$lu\n"
"%1$s: muxer stalled : %6$*2$lu\n"
"%1$s: splitter tried to consume from muxer : %7$*2$lu\n"
"%1$s: splitter stalled : %8$*2$lu\n",
pname, (int)sizeof(long unsigned) * (int)CHAR_BIT / 3 + 1,
s2w_q.av_or_eof.ccount, s2w_q.av_or_eof.wcount,
w2m_q.av_or_exit.ccount, w2m_q.av_or_exit.wcount,
m2s_q.av.ccount, m2s_q.av.wcount)
#else
# define FW ((int)sizeof(long unsigned) * (int)CHAR_BIT / 3 + 1)
"%s: any worker tried to consume from splitter: %*lu\n"
"%s: any worker stalled : %*lu\n"
"%s: muxer tried to consume from workers : %*lu\n"
"%s: muxer stalled : %*lu\n"
"%s: splitter tried to consume from muxer : %*lu\n"
"%s: splitter stalled : %*lu\n",
pname, FW, s2w_q.av_or_eof.ccount,
pname, FW, s2w_q.av_or_eof.wcount,
pname, FW, w2m_q.av_or_exit.ccount,
pname, FW, w2m_q.av_or_exit.wcount,
pname, FW, m2s_q.av.ccount,
pname, FW, m2s_q.av.wcount)
# undef FW
#endif
----^----

Of course I later removed the "#if 0" block (but not the leading
comment). I was shocked because I always considered OSF/1 a reference
implementation of the SUS. Of course I may have botched up the format
string, but I tried to verify it by re-reading the quoted paragraphs
many times, it worked on glibc flawlessly, and gcc's -Wformat=2 didn't
complain.

lacos
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,117
Messages
2,570,701
Members
47,278
Latest member
tbman

Latest Threads

Top