Ben Bacarisse said:
Tim Rentsch said:
Ben Bacarisse said:
Richard,Keith and all thanks a ton for your replies.
I have one more question.
In the code if I add the following line:
------------------------------------
z = s[p[0]]++; // say z is declared as int
------------------------------------
what would be the value of z and how it is evaluated?
Thanks a lot in advance.
Since you don't initialise the s array, it's members could contain
any garbage value, normally the value that was last written to the
memory location. So by extension, we cannot say z contains any useful
value after your assignment. In standard C parlance, it's value is
indeterminate, and reading an indeterminate object, as in the RHS of
your assignment statement, invokes undefined behaviour, again in the
standard's terminology.
Is this really true? I don't think it is, at least not in the general
way that is often presented here. An indeterminate value is either a
valid value of the type or it is a trap representation. Thus, on
system with no trap representations for objects interpreted as having
type T, accessing an indeterminate value of type T must simply be
unspecified.
Now, in this case, s was of type char. 6.2.6.1 p5 which defines and
discusses trap representations states that:
Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not
have character type, the behavior is undefined. If such a
representation is produced by a side effect that modifies all or any
part of the object by an lvalue expression that does not have
character type, the behavior is undefined. Such a representation is
called a trap representation.
I read that as forbidding UB when a trap representation is accessed
via an lvalue expression of type char which is the case here, is it
not?
I believe that's a misreading. What the passage says is that if the
access type is a non-character type then the behavior is undefined.
It does not say that if the access type is a character type then the
behavior is defined. Access through a character type interprets the
stored value (ie, the representation) according to the type used to do
the read; if the access type is (char) or (signed char) and the
representation read is a trap representation for that type, it's still
undefined behavior, because there's no (Standard-)defined way to
produce a value from a trap representation.
OK, that's reasonable (and was how I first read it): access via a
character type is undefined or defined depending on whether the byte
is or is not a trap representation for the character type used.
What, then, is the effect of the second sentence of the quote? It
must be to add a further blanket undefined for all accesses to one
type's trap representations when accessed via another type. I.e. that
given
union { int si; unsigned ui; } u;
access to u.ui is undefined when u.si holds a trap representation even
when unsigned int has no trap representations of its own.
If that is right there are two things that puzzled me and cause me to
over-think the clause in question. First, it seems odd to give signed
char this odd half-way position and, second, it seems at odds with the
explanation of unions in 6.5.2.3 p3. At the very least the footnote
should surely be expanded to cover the case where some other union
member is trap representation.
Neither of these are arguments for my reading. They are there to
explain why I thought the way I did.
Yes, that's perfectly understandable. Here are some ideas in
response to the implicit questions in your penultimate paragraph.
First, about unions. Suppose we have a union containing just a
signed integer member, eg,
union { signed int si; } siu;
where both sizeof siu == 4 and sizeof siu.si == 4 are true.
In such a case there are in fact two distinct objects, even though
they happen to occupy exactly the same area of memory -- there is
'siu', and 'siu.si'. We know these are different because the
object designated by 'siu' can never be a trap representation,
(because it's a union, which are never trap representations) even
though 'siu.si' holds a trap representation.
(Editorial side note: the language the Standard uses relating to
the term "object" in various places is among the poorest sets of
phrasings the Standard employs. At some point I might write
something more about that, but right now I'd like to gloss over
those problems.)
Similarly, in the example union mentioned above
union { int si; unsigned ui; } u;
there actually are three distinct objects -- u, u.si, and u.ui.
That at least two of these three occupy exactly the same bytes of
memory doesn't alter the number, since unions are described as
_overlapping_ objects.
What this means is that 'u.ui' and '*(unsigned*)&u.si', because
they are accessing different objects, are allowed to behave
differently.
Now for the second question -- why is (char), in the guise of
(signed char), different? Or why are types besides character
types distinguished? Here is my speculation. The character
types are different because, ever since the early days of C, the
type 'char' has been used to access memory "free form", and the
Standard didn't want to change that. The interesting question
is, why give blanket undefined behavior to all the other types?
Here is where the speculation goes a little deeper. I conjecture
that an implementation might want to use trap representations to
indicate "not yet initialized" values, doing this automatically
without being told, and furthermore that it knows this. In such
a case, we might want
unsigned u;
int i;
u = *(unsigned *) &i;
to be able to trap, because the variable 'i' hasn't been given an
explicit initial value. If the trap-representation-ness of
something depended just on what type is used for access, that
would prevent this form of error detection if some types had no
trap representations.
Even though the last part is pure speculation on my part, this
explanation seems like a plausible enough motivation for the funny
wording in 6.2.6.1#5. At least, for me it does so enough so that
my mental model can tolerate the seeming inconsistencies with
other areas of the Standard in this regard. So I offer it up here
in case it may be of help to other folks.