List of undefined behaviour and other sneeky bugs

T

Tim Rentsch

John Reye said:
No. That's wrong.
Concerning
i = a = i+1;
Assignment, is not a sequence point.
The above code does not say whether the value of (i+1)
is stored first in (a) or in (i).

This is defined:

for (i = 0; i < sizeof a / sizeof *a; ++i) {
a = i + 1; /* set a to {1, 2, 3, ...} */
}


I believe that is incorrect.
The sequence point only makes a difference for statements that contain
side-effects.
So this is not allowed
a = i++;
because = is no sequence point, and contains side-effects (i++).
See http://c-faq.com/expr/seqpoints.html


The latest C standard uses more precise terminology regarding
sequencing. If you haven't gotten that, or this close approximation

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

, I encourage you to do so.

But this is very much allowed:
i = a = i+1;
It uses the normal fact assignment is right-associative and that no
side-effects exist. [snip]


You're right that it is allowed. However there are two side-effects
(ie, the assignments), and being right-associative plays into
the question only partially. What is more important is that
the side-effect of storing into 'i' is /sequenced after/ the
value computations of the two operands (which in this case
are 'i' and 'a = i+1'). It is because of that sequencing
relationship that the store into 'i' must follow the computation
of which element a is to be accessed with the expression 'a'.
 
T

Tim Rentsch

James Kuyper said:
No. That's wrong.
Concerning
i = a = i+1;
Assignment, is not a sequence point.
The above code does not say whether the value of (i+1)
is stored first in (a) or in (i).

This is defined:

for (i = 0; i < sizeof a / sizeof *a; ++i) {
a = i + 1; /* set a to {1, 2, 3, ...} */
}


I believe that is incorrect.
The sequence point only makes a difference for statements that contain
side-effects.
So this is not allowed
a = i++;
because = is no sequence point, and contains side-effects (i++).
See http://c-faq.com/expr/seqpoints.html




But this is very much allowed:
i = a = i+1;
It uses the normal fact assignment is right-associative and that no
side-effects exist.


[snip description of side-effects]

In C99, the expression "i = a = i + 1" had undefined behavior because
the value of i was both stored by this expression, and read for purposes
other than determining the value to be stored in 'i', without any
intervening sequence points to determine which of those events occurs
first. The exception for determining the value to be stored allows "i =
i + 1", since the value of 'i' must be read in order to calculate the
value of 'i+1' which is to be stored in i.

However, in the case of 'i = a = i+1", the value of 'i' is also read
to determine which element of 'a' gets updated. It's unclear whether the
value of i should be updated before or after determining which element
to update. The C committee could have chosen to specify the sequence,
but they reached the conclusion that such code is inherently confusing,
and therefore should be discouraged. [snip elaboration]


There is evidence to the contrary, namely that they intended all
along that such uses be well-defined, and it was only due to a
poor choice of wording (in earlier Standards) that it wasn't.
It is unambiguously well-defined under C11.

The relevant clause in C99 was one of the most confusingly
worded parts of that standard, and provoked a lot of argument.

I would say it was poorly worded rather than confusingly
worded; IMO the meaning was clear, it was just a poor
choice of phrasing that caused confusion. In any case,
C11 has fixed that.
In C2011, I gather
that this whole issue has been more precisely defined by rewording
things in terms of the "visible sequence of side effect" and by
specifying that the evaluation of each expression may be "sequenced
before", "sequenced after", or "indeterminately sequenced" relative to
the evaluation of other expressions. The C2011 wording is greatly
complicated by the fact that it must now deal with the possibility of
multi-threaded code. As a result, I've not had time to review exactly
what the new standard says about such things. The new wording might make
the behavior of this code well-defined; but I doubt it - I think the
committee still believes that such code is inherently confusing, and
therefore should still be discouraged.

So although he hasn't read what it says exactly, he feels
free to conjecture what it does say? That's amusing.

Having taken some amount of time to study the documents available
on how to specify order of evaluation, including C11, I would
say this --

First, the behavior in question is well-defined under C11.

Second, several formal models for how "order of evaluation"
should be specified in C were discussed over the years by the
committee, and AFAIK all of them defined the semantics of
multiple assignments like this one; certainly the ones I
remember reading did, and the current one (in C11) does. So all
the evidence I've seen supports the idea that the committee
thought it important to define such cases, and I'm not aware of
any evidence that supports the view that the committee believes
such code is confusing or should be discouraged.

But in any case, under C11 the behavior is well-defined.
 
T

Tim Rentsch

pete said:
Tim said:
pete said:
John Reye wrote:

1)
a = i++;

http://c-faq.com/expr/evalorder1.html

So this attempt at optimisation is wrong:

int i;
int a[10];
for (i = 0; i < sizeof(a); )
a = i++; /* set a to {1, 2, 3, ...} */

Correctly optimized, it should be:
for (i = 0; i < sizeof(a); )
i = a = i+1; /* set a to {1, 2, 3, ...} */

No. That's wrong.
Concerning
i = a = i+1;
Assignment, is not a sequence point.
The above code does not say whether the value of (i+1)
is stored first in (a) or in (i). [snip alternative]


The behavior of

i = a = i+1;

is well-defined. I refer you to 6.5.16 p3 in N1570, specifically
the second-to-last sentence. By the time the store into 'i'
commences, the value (lvalue) computation of 'a' has already
completed, because it is sequenced before delivering the value
to be stored into 'i'.

The stores to 'i' and 'a' may happen in any order, but the
lvalue 'a' must be determined before any store into 'i'.


I don't see where it says what you say that it says.

Concerning (a = i+1),
"the value of the left operand after the assignment"
can be determined before the assignment
and without knowing which array element a refers to.

You don't have to know which array element a refers to,
to know
what the value of (i+1) converted to whichever type (a) has,
is.

n1570
6.5.16 Assignment operators
Semantics
3 An assignment operator stores a value
in the object designated by the left operand.
An assignment expression has
the value of the left operand after the assignment,111)
but is not an lvalue.
The type of an assignment expression
is the type the left operand would have after lvalue conversion.
The side effect of updating the stored value of the left operand
is sequenced after the value computations of the left
and right operands.
The evaluations of the operands are unsequenced.


The missing piece (which I should have mentioned before -
sorry about that) is 6.5 p1, last sentence:

The value computations of the operands of an operator are
sequenced before the value computation of the result of
the operator.

So the lvalue 'a' must be computed before there is a value
(produced by the intermediate '=' operator) to store into 'i'.
 
J

Joachim Schmitz

Tim said:
John Reye said:
Thanks for the exact C-standard reference.
Still: this is very unsettling.

Would this solve the issue ?? ->

struct
{
char a;
union {
int b;
struct {
char byte0;
char byte1;
char byte2;
char byte3;
};
};
} mystruct;
mystruct.byte0 = 0x12;
int tmp = mystruct.b;
f(b);

If it does solve the issue, and the lowest-address-byte of tmp is
0x12, then:
Ahhh it is not portable. Other ints have only 2 bytes.
Is there a portable way of fixing this??

If for some reason you want to set just the first byte of an
int, it can be done thusly:

union {
char c[ sizeof (int) ];
int i;
} stuff;

stuff.i = <some value>;
stuff.c = <first-byte value>;

stuff.c[0] = said:
... use stuff.i ...

Bye, Jojo
 
G

Guest

That's wrong. Reading a member other than the last member
set has defined behavior. In cases where the member read
has a size larger than the last member set (as is the case
here), the value read depends on unspecified values in
some bytes. If those unspecified values cause the object
representation for the type in question to be a trap
representation, then that will cause undefined behavior.
But simply reading a member other than the last member
set is not in and of itself undefined behavior. And the
usual case, ie, reading a member whose size is no larger
than that of the member set, commonly has well-defined
and sometimes useful behavior.

but it is implementation defined so you can't, in general, predict what value you'll get
 
G

Guest

Ben Pfaff said:
I believe that is the intention. I couldn't find relevant text
in the Rationale.

If you read the Defect Report (sorry, I don't know the
number offhand) that prompted the famous "type punning"
footnote, I think you'll find that the type punning use
cases were expected and intended all along, eg, even in
C89/C90. I presume they were important to make defined
because a significant amount of pre-ANSI C code relied
on them working.

In particular: is it a misuse of unions, to use them in order to
enable different memory-accesses (word, byte, etc.) to the same
portion of memory???? I have (mis)used it like this in above posts
quite often.
Is that strictly wrong?

I think there might be an "out" for accessing any type as an
array of unsigned char this way, but I believe that in general
this is strictly wrong. [snip]

How do you reconcile this view with the statement in
clear and plain English about how union member access
works to accomplish type punning? I think anyone who
takes the time to read through the relevant sections
defining the semantics involved will find accessing a
union member is defined irrespective of which member
was last set (although the result may depend on
unspecified values and consequently undefined behavior
because of trap representations, but the access itself
is well-defined).

********************************
Draft ANSI Standard

3.3.2.3 Structure and union members
[...]
Semantics
[... 3rd para]
With one exception, if a member of a union object is accessed after
a value has been stored in a different member of the object, the
behavior is implementation-defined./33/ One special guarantee is made
in order to simplify the use of unions: If a union contains several
structures that share a common initial sequence, and if the union
object currently contains one of these structures, it is permitted to
inspect the common initial part of any of them. Two structures share
a common initial sequence if corresponding members have compatible
types for a sequence of one or more initial members.
********************

I don't think "implementation defined" is the same as well defined. They could return zero every time if they wanted.
 
J

James Kuyper

If you read the Defect Report (sorry, I don't know the
number offhand) that prompted the famous "type punning"
footnote, I think you'll find that the type punning use
cases were expected and intended all along, eg, even in
C89/C90. I presume they were important to make defined
because a significant amount of pre-ANSI C code relied
on them working. ....
How do you reconcile this view with the statement in
clear and plain English about how union member access
works to accomplish type punning? I think anyone who
takes the time to read through the relevant sections
defining the semantics involved will find accessing a
union member is defined irrespective of which member
was last set (although the result may depend on
unspecified values and consequently undefined behavior
because of trap representations, but the access itself
is well-defined).

********************************
Draft ANSI Standard

3.3.2.3 Structure and union members
[...]
Semantics
[... 3rd para]
With one exception, if a member of a union object is accessed after
a value has been stored in a different member of the object, the
behavior is implementation-defined./33/ One special guarantee is made
in order to simplify the use of unions: If a union contains several
structures that share a common initial sequence, and if the union
object currently contains one of these structures, it is permitted to
inspect the common initial part of any of them. Two structures share
a common initial sequence if corresponding members have compatible
types for a sequence of one or more initial members.
********************

I don't think "implementation defined" is the same as well defined. They could return zero every time if they wanted.

This is the "famous type-punning footnote" he referred to. It's footnote
95 in n1570.pdf:
"If the member used to read the contents of a union object is not the
same as the member last used to store a value in the object, the
appropriate part of the object representation of the value is
reinterpreted as an object representation in the new type as described
in 6.2.6 (a process sometimes called ‘‘type punning’’)."

Footnotes are informative, not normative. Informative text is supposed
to explain implications of what the normative text says, but if the
normative text does not actually imply what the footnote says it does,
the footnote is just plain wrong. The standard does not describe this
reinterpretation or type punning anywhere in the normative text.
Footnote 95 seems to be suggesting that this is a conclusion that can be
reached from the combination of what it says in 6.5.2.3p3 about the
semantics of the member selection operator, and what it says in 6.2.6
about the representations of types. It's not clear to me that this is
actually the case.

However, as a practical matter, footnote 95 documents what essentially
all implementations of C have always done, and will continue to do, when
translating such code. It unambiguously reflects the intent of the
committee, whether or not that intent has actually been clearly
expressed in the normative text of the standard. I think a programmer
who relies upon footnote 95 as if it were, in fact, a correct summary of
the normative text will never run into any trouble by reason of doing so.
 
T

Tim Rentsch

Joachim Schmitz said:
Tim Rentsch wrote: [snip]
If for some reason you want to set just the first byte of an
int, it can be done thusly:

union {
char c[ sizeof (int) ];
int i;
} stuff;

stuff.i = <some value>;
stuff.c = <first-byte value>;

stuff.c[0] = said:
... use stuff.i ...

Yes that's right of course. Thank you for the correction.
 
T

Tim Rentsch

[snip]
Or:
struct
{
char a;
union {
int b;
char c;
};
} mystruct;
mystruct.c = 0x12

Would this union work on every platform??

Yes. But the you won't be able to use the 'b' element of
the union since reading a different member than has been
set the last time round also invokes undefined behavior.

That's wrong. Reading a member other than the last member
set has defined behavior. In cases where the member read
has a size larger than the last member set (as is the case
here), the value read depends on unspecified values in
some bytes. If those unspecified values cause the object
representation for the type in question to be a trap
representation, then that will cause undefined behavior.
But simply reading a member other than the last member
set is not in and of itself undefined behavior. And the
usual case, ie, reading a member whose size is no larger
than that of the member set, commonly has well-defined
and sometimes useful behavior.

but it is implementation defined so you can't, in general,
predict what value you'll get

Neither C11 nor C99 use 'implementation-defined' in the
description of union member access. The C90/C89 standard does
use this phrase; however, what was meant, and what was generally
understood by implementors, is that the result is defined (ie, as
reinterpreting the underlying bits in the new representation) but
depends on other implementation-defined choices, the most obvious
of which is how different types are represented. I suppose it is
possible, even though very unlikely, that some implementors of
very old compilers took advantage of the 'implementation-defined'
stipulation in C90 to do something unexpected here. Do you have
any experience that suggests such implementations actually exist?
Unless there is one such otherwise-conforming implementation that
someone can point to, I don't think it's wrong to say the behavior
is defined even under C90; certainly it is right under both C99
and the current C11 standard.
 
T

Tim Rentsch

pete said:
pete said:
Tim Rentsch wrote:
The value computations of the operands of an operator are
sequenced before the value computation of the result of
the operator.

So the lvalue 'a' must be computed before there is a value
(produced by the intermediate '=' operator) to store into 'i'.


I think that you may be right.
I also made an obsolete statement elsewhere in this thread
about the postfix increment operator.


Actually, it was in the "pointer arithmetic question" thread.


Aha, I see that... will respond there.
 
T

Tim Rentsch

Ben Pfaff said:
So if you use a particular union access member... for a particular
union in memory, then you should always access that particular union-
variable, with the same member?

Is that the only intended use?

I believe that is the intention. I couldn't find relevant text
in the Rationale.

If you read the Defect Report (sorry, I don't know the
number offhand) that prompted the famous "type punning"
footnote, I think you'll find that the type punning use
cases were expected and intended all along, eg, even in
C89/C90. I presume they were important to make defined
because a significant amount of pre-ANSI C code relied
on them working.

In particular: is it a misuse of unions, to use them in order to
enable different memory-accesses (word, byte, etc.) to the same
portion of memory???? I have (mis)used it like this in above posts
quite often.
Is that strictly wrong?

I think there might be an "out" for accessing any type as an
array of unsigned char this way, but I believe that in general
this is strictly wrong. [snip]

How do you reconcile this view with the statement in
clear and plain English about how union member access
works to accomplish type punning? I think anyone who
takes the time to read through the relevant sections
defining the semantics involved will find accessing a
union member is defined irrespective of which member
was last set (although the result may depend on
unspecified values and consequently undefined behavior
because of trap representations, but the access itself
is well-defined).

********************************
Draft ANSI Standard

3.3.2.3 Structure and union members
[...]
Semantics
[... 3rd para]
With one exception, if a member of a union object is accessed after
a value has been stored in a different member of the object, the
behavior is implementation-defined./33/ One special guarantee is made
in order to simplify the use of unions: If a union contains several
structures that share a common initial sequence, and if the union
object currently contains one of these structures, it is permitted to
inspect the common initial part of any of them. Two structures share
a common initial sequence if corresponding members have compatible
types for a sequence of one or more initial members.
********************

This text corresponds to 6.3.2.3, fifth paragraph, in C90. The
footnote /33/ referenced above is footnote 41 in the C90 document.
That footnote says:

The ``byte orders'' for scalar types are invisible to isolated
programs that do not indulge in type punning (for example, by
assigning to one member of a union and inspecting the storage
by accessing another member that is an appropropriately sized
array of character type), but must be accounted for when
conforming to externally imposed storage layouts.
I don't think "implementation defined" is the same as well
defined. They could return zero every time if they wanted.

I think the footnote makes it clear that the intention of the
'implementation-defined' wording is to reinterpret the underlying
bytes in the new type. (Note that the quoted paragraph does
not exclude character types from the 'implementation-defined'
category.) This view is also supported by DR 283 (I was able
to find it, fortunately easily, by "defect report" "type punning"
as a google query)

http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm

which says in part (and written by someone on the committee)

It is not perfectly clear that the C99 words have the same
implications as the C89 words.

So even though C89/C90 says 'implementation-defined' and C99
doesn't, the indications are that the meaning intended is the
same in both cases, and also C11, which uses the same wording
as C99. (There are minor changes in the C11 footnote relative
to N1256, but these are incidental.)

Summing up - I think you're right in some technical sense that
the C90 allowed returning zero every time. But that isn't what
was intended, nor AFAIAA what any implementor took it to mean.
Unless there is a particular existing implementation that can be
pointed to that exhibits the problem, the point seems moot;
surely no current implementation would do anything other than
follow the revised wording in C99 or C11.
 
T

Tim Rentsch

[snip]
Footnote 95 seems to be suggesting that this is a conclusion that
can be reached from the combination of what it says in 6.5.2.3p3
about the semantics of the member selection operator, and what it
says in 6.2.6 about the representations of types. It's not clear to
me that this is actually the case.

For those who are interested, it isn't hard to verify that the
normative text in C99/C11 supports what this footnote says in
informative text. A previous posting of mine (posted, IIANM, in
January of this year) listed the relevant sections or paragraphs
that contribute to that. I guess JK didn't see that since
apparently he routinely does not read my postings.
However, as a practical matter, footnote 95 documents what
essentially all implementations of C have always done, and will
continue to do, when translating such code. It unambiguously
reflects the intent of the committee, whether or not that intent
has actually been clearly expressed in the normative text of the
standard. I think a programmer who relies upon footnote 95 as if
it were, in fact, a correct summary of the normative text will
never run into any trouble by reason of doing so.

Ditto.
 
B

Ben Pfaff

Tim Rentsch said:
If you read the Defect Report (sorry, I don't know the
number offhand) that prompted the famous "type punning"
footnote, I think you'll find that the type punning use
cases were expected and intended all along, eg, even in
C89/C90. I presume they were important to make defined
because a significant amount of pre-ANSI C code relied
on them working.

Either I didn't know about that defect report or I'd forgotten
it. Thanks for the correction.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,079
Messages
2,570,574
Members
47,207
Latest member
HelenaCani

Latest Threads

Top