VLA question

K

Keith Thompson

Malcolm McLean said:
Quirks such as sizeof('x') giving different values in C and C++, and
int *x = malloc(10 * sizeof(int)) being a type incompatible error in
C++ but not C are best ironed out, I agree. // comments have been widely
accepted, but I have been bitten by compilers rejecting them. Whilst
you can usually set a flag, this often isn't acceptable, user needs to
be able to compile with a simple " C_compiler compile code.c " command.

Neither language standard (nor any that I'm aware of) requires
compilers to conform *by default*, with no command-line options --
and I'm not aware of any circumstances where specifying a flag is
not acceptable.

The // "quirk" was ironed out by the C99 standard.

C++ isn't going to make character constants be of type int; it would
break existing code. I suppose C could change them to type char;
that would definitely break some *contrived* C programs, but I'm not
sure whether it would break real-world code, and I'm not sure the
comittee would be willing to take the time to determine that.

Removing C's implicit conversion from void* would break tons of
existing code, so that will never happen. I don't know whether
adding such an implicit conversion to C++ would break existing code,
but I presume it was excluded from C++ for valid reasons.

Cleaning up the minor incompatibilies between C and C++ sounds like a
nice idea, but I don't think it's practical. IMHO it's better to just
accept the fact that they're two different languages. There's some
value in avoiding gratuitous incompatibilities, but only some.
 
K

Keith Thompson

Malcolm McLean said:
Ideally yes.
Whilst it's possible to artificially construct legal, defined ANSI C programs
which will break on various compilers in default mode, usually you do that
either by artificially using little-used constructs

e.g.

x = y //*ha ha two fingers to you */z;

or, more forgiveably, you use identifiers like min which are often defined
by the environment.

There are a few areas where things are difficult, e.g. handling PI and nan().

How would the language standard require compilers to implement the
language standard by default? Its requirements by definition apply
*only* to conforming implementations. I suppose it could say that a
conforming compiler must be conforming by default, but the only effect
of that would be to say that gcc, for example, is non-conforming, which
it is anyway.

It's not always clear what "by default" means; not all compilers
necessarily use command-line options.
 
S

Stephen Sprunk

Neither language standard (nor any that I'm aware of) requires
compilers to conform *by default*, with no command-line options --
and I'm not aware of any circumstances where specifying a flag is not
acceptable.

The // "quirk" was ironed out by the C99 standard.

It's not a "quirk"; it was C blatantly adopting a feature of C++, as had
been done previously with function prototypes.
C++ isn't going to make character constants be of type int; it would
break existing code.

It would also be nonsensical.
I suppose C could change them to type char; that would definitely
break some *contrived* C programs, but I'm not sure whether it
would break real-world code, and I'm not sure the comittee would
be willing to take the time to determine that.

IMHO, any code that relies on character literals being of type int is
_already_ broken, and the current mis-definition does nothing but hide
such bugs from the programmer, which is not worth preserving.

Note that I do excuse certain standard functions that return a character
value _as_ an int; that's different.
Removing C's implicit conversion from void* would break tons of
existing code, so that will never happen.

I agree.
I don't know whether adding such an implicit conversion to C++
would break existing code,

I can't see how it would; any code that depends on such a conversion is
not valid C++ code in the first place. It might un-break code that is
currently broken, but who cares?

Whether the C++ committee would make such a change solely for the sake
of harmonizing C and the C-like subset of C++ is unknown, but if that
were the _only_ remaining difference (i.e. the C committee had already
cleared up everything else on its side), I'd like to think that they
would. At worst, it's a minor inconvenience that could be worked around
with casts that would be unnecessary in C.
but I presume it was excluded from C++ for valid reasons.

I suspect it was part of C++ having relatively strong type safety from
the start, whereas C evolved from a language with no types at all and
has had to make do with weak type safety at best since then.
Cleaning up the minor incompatibilies between C and C++ sounds like
a nice idea, but I don't think it's practical.

There are so few substantive differences that IMHO it's a reasonable
goal. I would need something more than
IMHO it's better to just accept the fact that they're two different
languages.

When >99% of C code can be (or already is) written so that it compiles
correctly as C++ code, and most programmers already consider C++ to be a
superset of C, it seems more pragmatic to make that a reality rather
than stubbornly insist that it is not due to the <1% of (questionable,
probably contrived) code that disproves it.

S
 
M

Malcolm McLean

IMHO, any code that relies on character literals being of type int is
_already_ broken, and the current mis-definition does nothing but hide
such bugs from the programmer, which is not worth preserving.
In C it seldom matters. You can write sizeof('c'), but that's almost certainly
contrived. In C++ it does, because you can overload functions to behave
differently when passed an int or a char. User expects stream << '\n' to
write a character newline to stream.

But if we declare that henceforth 'c' will have type char, and you've a
codebase of a million lines of C, you've got to run regression tests
on the whole lot, just in case something has broken. That's not trivial.
 
Ö

Öö Tiib

I agree.


I can't see how it would; any code that depends on such a conversion is
not valid C++ code in the first place. It might un-break code that is
currently broken, but who cares?

It can perhaps break some overload resolution or make it ambiguous.
Big damage to existing code is unlikely since void* is used in purely
C++ code quite rarely.
Whether the C++ committee would make such a change solely for the sake
of harmonizing C and the C-like subset of C++ is unknown, but if that
were the _only_ remaining difference (i.e. the C committee had already
cleared up everything else on its side), I'd like to think that they
would.

It is rather unlikely that C++ committee will make such change. Stroustrup
has decided that it is too unsafe for C++ and AFAIK has explained it in his
FAQs somewhere.
At worst, it's a minor inconvenience that could be worked around
with casts that would be unnecessary in C.

Unless I misunderstand we are talking only of few idiomatic cases like
'malloc(sizeof(T))'. A custom preprocessor that adds redundant casts
feels trivial to write.
 
S

Stephen Sprunk

In C it seldom matters. You can write sizeof('c'), but that's almost
certainly contrived.

If done by an experienced programmer, sure. A novice may expect it to
be equal to sizeof(char), i.e. it's a bug.
But if we declare that henceforth 'c' will have type char, and you've
a codebase of a million lines of C, you've got to run regression
tests on the whole lot, just in case something has broken. That's not
trivial.

Don't you run regression tests before every release anyway? That's the
point of having them, after all. Even if you're too "agile" for that,
wouldn't you at least test after upgrading your compiler--or changing
the compiler flag to a different language version?

S
 
I

Ike Naar

Unless I misunderstand we are talking only of few idiomatic cases like
'malloc(sizeof(T))'. A custom preprocessor that adds redundant casts
feels trivial to write.

C++ has 'new T' which seems to be more idiomatic than 'malloc(sizeof(T)'.
 
J

James Kuyper

On 26-Jun-13 12:30, Keith Thompson wrote:
....
[Re: Character literals]
IMHO, any code that relies on character literals being of type int is
_already_ broken, and the current mis-definition does nothing but hide
such bugs from the programmer, which is not worth preserving.

Given that the standard currently guarantees that they are of type
'int', I cannot agree with that assessment. However, that led me to
think about precisely what kind of code you're talking about. Obviously,
the value of (sizeof 'c') depends upon that assumption, but such
expressions would only occur in code deliberately contrived to
distinguish C from C++ according tot he current rules.

In C++, the type of a character literal can determine which operator
overload is selected, which is why C++ made this change. There was no
comparable reason in C, until the addition of _Generic() to C2011.
However, _Generic() is very new and not widely supported, and as a
result, not widely used. I doubt that you're talking about such code.

The value of a character literal will be same, whether it has type 'int'
or type 'char', so long as char is signed, or is unsigned with CHAR_MAX
<= INT_MAX. Only if CHAR_MAX > INT_MAX could it matter. Character
literals that currently have a negative value would instead have a
positive value greater than INT_MAX. Note that the requirement that
members of the basic execution character set have positive values
applies only after they have been stored in a char object; if char is
unsigned, it is not possible to violate that requirement, regardless of
the sign of the corresponding character literal. And, of course, that
requirement never applied to members of the extended execution character
set.

You might expect the signedness of character literals to matter, but in
most contexts, expressions of type 'char' are subject to the integer
promotions; either as part of the default argument promotions, or as
part of the usual arithmetic conversions. On implementations where char
is signed, or is unsigned with CHAR_MAX <= INT_MAX, char promotes to
int, so it doesn't make a difference. Again, it is only implementations
where CHAR_MAX > INT_MAX where values of type 'char' are promoted to
unsigned int, rather than int, with corresponding implications for the
usual arithmetic conversions.

Implementations where CHAR_MAX > INT_MAX are extremely rare, probably
non-existent, but if one does exist, it could be fully conforming to the
C standard. Such an implementation must have CHAR_MIN == 0, CHAR_BIT >=
16, and unless there's a lot of padding bits in an int, sizeof(int) == 1.

In general, when converting an unsigned type to a signed type "either
the result is implementation-defined or an implementation-defined signal
is raised". However, the behavior of fputc() is defined in terms of
conversion from int to unsigned int, and the behavior of fgetc() is
defined in terms of the revers conversion. The requirement that data
written to a binary file using fputc() must be unchanged when read back
using fgetc() therefore implies that, on such an implementation, those
conversions must be inverses, and cannot raise a signal. This
guarantees, for example, that the definition

int letter = 'c';

will place the same value in letter, regardless of whether 'c' is an int
with a value <= INT_MAX, or a 'char' with a value > INT_MAX. If you
change 'letter' to be of type long, then you could detect a difference,
assuming that LONG_MAX > INT_MAX. Similar complications come up when
trying to write code which has different behavior if 'char' promotes to
'unsigned int'.

On such an implementation, the character written to a binary file by
fputc(EOF), if successfully read back using fgetc(), will force fgetc()
to return EOF. Therefore, code using fgetc() must check ferror() and
feof(), it cannot rely upon comparison with EOF alone to determine
whether or not a character was successfully read.

Conclusion:
Code which depends upon the assumption that character literals have type
'int' (excluding cases involving sizeof or _Gerneric()) must do so in
one of two basic ways:

1. Assuming that character literals never have a value greater than INT_MAX.
2. Assuming that character literals are never promoted to 'unsigned int'.

Those assumptions could fail if character literals were changed to
'char', but only on implementations where CHAR_MAX > INT_MAX. Are you
sure you want to claim that all such code is "broken"?

It took me a long time to put together the analysis above, and the
relevant issues are subtle enough that I'm not sure the analysis is
entirely correct. In order to make such a change, the committee members
would have to make a similar analysis, and would have to discuss it
among themselves to determine whether the negative consequences of the
change are small enough to justify it. I agree with Keith - I am "not
sure whether the committee would be willing to take the time to
determine that".
 
Ö

Öö Tiib

C++ has 'new T' which seems to be more idiomatic than 'malloc(sizeof(T)'.

Indeed it is in C++ but not in C and we are discussing common subset.

A tool that replaces 'malloc's with 'new's and 'new[]'s on the fly is less
trivial to write than one that just adds casts. It has also to replace
'free's with 'delete's and 'delete[]'s and those may be in different file.
 
E

Eric Sosman

[...]
Unless I misunderstand we are talking only of few idiomatic cases like
'malloc(sizeof(T))'. A custom preprocessor that adds redundant casts
feels trivial to write.

What's the correct cast to insert in

ptr->link = malloc(sizeof *ptr->link);

? That is, the "custom preprocessor" needs to associate identifiers
with types, analyze expressions (lvalue expressions, anyhow) to derive
their types, keep track of scopes, and so on. The task is certainly
do-able -- a compiler can do it -- but goes well beyond the purely
lexical and some distance past what most would call "trivial."

Here's another:

#include <limits.h>
#if INT_MAX >= 1000000
int *array;
#else
long *array;
#endif
...
array = malloc(n * sizeof *array);

Note that simply running the source through a C preprocessor and
then through the cast-adding tool produces a result that is not
portable. A portable rewrite of the final line would be

array =
#if INT_MAX >= 1000000
(int*)
#else
(long*)
#endif
malloc(n * sizeof *array);

That is, the cast-adder would have to keep track of all the
different types `array' might have, and of all the preprocessor
conditionals that choose between candidate types. INT_MAX won't
get changed in mid-module, but in the general case a macro that
led to a particular type where `array' was declared might be
redefined or undefined by the time `array' is actually used --
so just copying the #if expressions isn't guaranteed to give
the right result ...

"Trivial" is relative, but this doesn't feel "trivial" to me.
 
I

Ike Naar

Indeed it is in C++ but not in C and we are discussing common subset.

Others have stated that C++ and C are different languages and I agree
with that. Though code written in the common subset is, technically,
C++, it is often non-idiomatic C++.
Many C-isms are considered bad style in C++.
 
S

Stephen Sprunk

Given that the standard currently guarantees that they are of type
'int', I cannot agree with that assessment. However, that led me to
think about precisely what kind of code you're talking about.
Obviously, the value of (sizeof 'c') depends upon that assumption,
but such expressions would only occur in code deliberately contrived
to distinguish C from C++ according tot he current rules.

IMHO, anyone that uses such tricks to distinguish between C and C++,
rather than simply using #ifdef __cplusplus, deserves what they get.
In C++, the type of a character literal can determine which operator
overload is selected, which is why C++ made this change. There was
no comparable reason in C, until the addition of _Generic() to
C2011. However, _Generic() is very new and not widely supported, and
as a result, not widely used. I doubt that you're talking about such
code.
No.

The value of a character literal will be same, whether it has type
'int' or type 'char', so long as char is signed, or is unsigned with
CHAR_MAX <= INT_MAX. Only if CHAR_MAX > INT_MAX could it matter.
Character literals that currently have a negative value would instead
have a positive value greater than INT_MAX.

The only example I can envision a problem with is a character literal
that today is negative. IIRC, the conversion to char is well-defined in
that case. However, if character literals were char, it'd have a large
positive value. Storing in a char would still be fine, but storing in
an int would require a possibly-problematic conversion.

Is that the concern?
Implementations where CHAR_MAX > INT_MAX are extremely rare,
probably non-existent, but if one does exist, it could be fully
conforming to the C standard. Such an implementation must have
CHAR_MIN == 0, CHAR_BIT >= 16, and unless there's a lot of padding
bits in an int, sizeof(int) == 1.

If we also assume that int has no padding bits, that doesn't seem
completely unreasonable, actually. There are probably DSPs like that.

The problem could be solved by the implementation defining plain char to
be signed, which is the only sane choice if a character literal can be
negative (even as an int) in the first place.

If all character literals are non-negative and plain char is unsigned,
then there is no problem making them char on such a system. That is
what a C++ implementation would have to do, and it would be insane for C
implementations for the same platform to differ.
Conclusion: Code which depends upon the assumption that character
literals have type 'int' (excluding cases involving sizeof or
_Gerneric()) must do so in one of two basic ways:

1. Assuming that character literals never have a value greater than
INT_MAX.
2. Assuming that character literals are never promoted to 'unsigned
int'.

Those assumptions could fail if character literals were changed to
'char', but only on implementations where CHAR_MAX > INT_MAX. Are
you sure you want to claim that all such code is "broken"?

I'm still wrapping my head around your (excellent, BTW) analysis, but my
gut tells me that such code is indeed "broken". Perhaps if you could
come up with a concrete example of code that you think is non-broken but
would fail if character literals were char, rather than an abstract
argument?

S
 
S

Stephen Sprunk

ptr->link = (typeof *ptr->link)malloc(sizeof *ptr-link);

*sigh* I really need to stop hitting "Send" before proofreading--or
quit depending on the compiler to catch errors. That should be:

ptr->link = (typeof ptr->link)malloc(sizeof *ptr-link);

S
 
K

Keith Thompson

Stephen Sprunk said:
On 27-Jun-13 07:28, James Kuyper wrote: [...]
The value of a character literal will be same, whether it has type
'int' or type 'char', so long as char is signed, or is unsigned with
CHAR_MAX <= INT_MAX. Only if CHAR_MAX > INT_MAX could it matter.
Character literals that currently have a negative value would instead
have a positive value greater than INT_MAX.

The only example I can envision a problem with is a character literal
that today is negative. IIRC, the conversion to char is well-defined in
that case. However, if character literals were char, it'd have a large
positive value. Storing in a char would still be fine, but storing in
an int would require a possibly-problematic conversion.

That doesn't seem right. A character constant that has a negative value
today (because plain char is a signed type) would still have a negative
value if character constants were of type char. It would just be a
negative value of type char.

Here's a demonstration:

#include <stdio.h>
int main(void) {
printf("'\\x80' = %d\n", '\x80');
return 0;
}

On a system with CHAR_BIT==8, with plain char being signed, I get:

'\x80' = -128

If '\x80' were of type char, its value would be (char)-128, which would
be promoted to int (because printf is variadic), giving the same result.
 
J

James Kuyper

Stephen Sprunk said:
On 27-Jun-13 07:28, James Kuyper wrote: [...]
The value of a character literal will be same, whether it has type
'int' or type 'char', so long as char is signed, or is unsigned with
CHAR_MAX <= INT_MAX. Only if CHAR_MAX > INT_MAX could it matter.
Character literals that currently have a negative value would instead
have a positive value greater than INT_MAX.

The only example I can envision a problem with is a character literal
that today is negative. IIRC, the conversion to char is well-defined in
that case. However, if character literals were char, it'd have a large
positive value. Storing in a char would still be fine, but storing in
an int would require a possibly-problematic conversion.

That doesn't seem right. A character constant that has a negative value
today (because plain char is a signed type) would still have a negative
value if character constants were of type char. It would just be a
negative value of type char.

Here's a demonstration:

#include <stdio.h>
int main(void) {
printf("'\\x80' = %d\n", '\x80');
return 0;
}

On a system with CHAR_BIT==8, with plain char being signed, I get:

'\x80' = -128

If '\x80' were of type char, its value would be (char)-128, which would
be promoted to int (because printf is variadic), giving the same result.

As I indicated above, the problem I described arises only on
implementations where CHAR_MAX > INT_MAX. If CHAR_BIT==8, then you can't
have been testing on such a system.
 
S

Seebs

As I indicated above, the problem I described arises only on
implementations where CHAR_MAX > INT_MAX. If CHAR_BIT==8, then you can't
have been testing on such a system.

I was about to say "hang on, how can that happen, int must be at least
as wide as char", but of course, it can happen if CHAR_MAX == UCHAR_MAX.

-s
 
J

James Kuyper

I was about to say "hang on, how can that happen, int must be at least
as wide as char", but of course, it can happen if CHAR_MAX == UCHAR_MAX.

Right - as I mentioned earlier, CHAR_MAX > INT_MAX implies that CHAR_MIN
== 0.
 
K

Keith Thompson

James Kuyper said:
Right - as I mentioned earlier, CHAR_MAX > INT_MAX implies that CHAR_MIN
== 0.

Suppose CHAR_BIT==32, CHAR_MIN==-2**31, CHAR_MAX==2**31-1,
sizeof(int)==1, and int has one padding bit, so INT_MAX==2**30-1.
 
J

James Kuyper

Suppose CHAR_BIT==32, CHAR_MIN==-2**31, CHAR_MAX==2**31-1,
sizeof(int)==1, and int has one padding bit, so INT_MAX==2**30-1.

You're right - I reached that conclusion so many years ago that I forgot
the assumptions I relied upon to reach it. I was thinking of the minimal
case where CHAR_MAX is as small as possible while still being greater
than INT_MAX, in which case there's no room for padding bits. If you
move away from the minimal case, there is room for padding bits, and
then the argument breaks down. Of course, such implementations are even
less commonplace than the minimal case.

I'll have to review my earlier comments more carefully with that
correction in mind.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,077
Messages
2,570,566
Members
47,202
Latest member
misc.

Latest Threads

Top