i = v[i++] results in undefined behavior. Can't understand why.

J

James Kanze

* James Kanze, on 29.09.2010 18:21:

[...]
Uhm, someone asked me about this issue privately, and
I had to say that I knew next to nothing about C++0x rules
and couldn't provide any insight.
What I should have said was what's occurring to me now:
It does not matter what corner cases C++0x makes well-defined.
One has to code as if by (well known) C++98 rules anyway.
Don't you agree?

Definitely. For a long time yet, anyway.
 
J

James Kanze

[...]
Agreed. Whatever is meant by "the assignment", it is clear that there are
four things being sequenced, "the assignment" being wedged after two and
before the last. The other three are value computations. If "the assignment"
is not the side effect, I would definitely like to know what it is.

That's what I couldn't figure out. Taken alone, "the
assignment" really isn't very precise. In context, just
about everything but the side effect has been explicitly
mentionned, so there's nothing else left for it to be. (But
as I said, I've been caught out by standardese before.)
I am not so sure that the side-effect could not be meant.

Sloppy wording on my part. It didn't mean to use meant to
refer to the semantics of the phrase, but to the intention
behind it. I doubt that this is what was intended. (I.e.
we didn't mean to say what we said.) As Pete said, the
intention was that the actual effect of the rules didn't
change for single threaded programs: what was well defined
before remains well defined, and what was undefined before
remains undefined.
I have no experience in multi-threading, but maybe there
are code snippets demonstrating the need of such
sequencing. Are discussions of the committee available
online where one could find examples and rationales?

The discussions in the meetings, no, since they're purely
oral. The discussions on the reflectors are, but only to
"members". And I don't think that they're effectively
searchable, so it could take a lot of time to find what
you're looking for.

We probably should ask on comp.std.c++. We're more likely
to encounter someone who participated in the discussions
about this rewording there.
 
J

Johannes Schaub (litb)

Armen said:
Armen said:
Please help me, I just can't understand this.
Clause 1.9 Paragraph 15 (n3092) says:
Except where noted, evaluations of operands of individual operators
and of subexpressions of individual
expressions are unsequenced. [ Note: In an expression that is
evaluated more than once during the execution
of a program, unsequenced and indeterminately sequenced evaluations of
its subexpressions need not be
performed consistently in different evaluations. —end note ] The value
computations of the operands of an
operator are sequenced before the value computation of the result of
the operator. If a side effect on a scalar
object is unsequenced relative to either another side effect on the
same scalar object or a value computation
using the value of the same scalar object, the behavior is undefined.
[ Example:
void f(int, int);
void g(int i, int* v) {
i = v[i++]; // the behavior is undefined
i = 7, i++, i++; // i becomes 9
i = i++ + 1; // the behavior is undefined
i = i + 1; // the value of i is incremented
f(i = -1, i = -1); // the behavior is undefined
}
—end example ]
let's consider i = v[i++]. the side effect of i being incremented by 1
is SEQUENCED before the side effect of i being assigned v[i++],
because "The value computations of the operands of an operator are
sequenced before the value computation of the result of the operator".
So how come is this undefined behavior?

Because value computations do not include side effects. So you have two
unsequenced side effects in your snippet (the increment and assignment).
Moreover you have a value computation on i (left i) that is unsequenced
relative to a side effect on i (the right "i++").

If you write this as "i = v[++i]", which is equivalent to "i = *(v + (i =
i + 1))" you will not have two unsequenced side effects anymore, because
the assignment in "i = i + 1" is sequenced before the assignment in "i =
*(...". BUT you still have the same value computation be unsequenced to
the same side effect as in your snippet. So for both the pre and postfix
version you have undefined behavior.

If I may quote you from another thread :)
Please disregard the last one. That's still undefined in C++0x it seems.
Value computation of the left i is not sequenced relative to the side
effect of "++i".

i = ++i; whether or not this is defined depends pretty much on what a
value computation means.

In fact, we had this case of "i = ++i" on stackoverflow now. The Standard
says "If a side effect on a scalar object is unsequenced relative to either
another side effect on the same scalar object or a value computation using
the value of the same scalar object, the behavior is undefined.". The key
point is "using the value of the same scalar object". It did not appear to
me until someone actually pointed it out to me.

We found that since an lvalue that just refers to an object does not use the
value of an object, that "i = ++i" is not UB. While "i" on the left side is
a value computation of that lvalue expression "i", it does not use the value
of the scalar object i. So we have "value computation" that acts on
expressions, but does not use the value of the object that it computed an
lvalue to, in case of glvalue evaluation. I suspect that this is a
reasonable reading of the Standard.
 
B

Bo Persson

Johannes said:
In fact, we had this case of "i = ++i" on stackoverflow now. The
Standard says "If a side effect on a scalar object is unsequenced
relative to either another side effect on the same scalar object or
a value computation using the value of the same scalar object, the
behavior is undefined.". The key point is "using the value of the
same scalar object". It did not appear to me until someone actually
pointed it out to me.

No, the key point here is "another side effect on the same scalar
object". The increment and the assignment operator both have side
effects on i (storing new values).


Bo Persson
 
J

Johannes Schaub (litb)

Bo said:
No, the key point here is "another side effect on the same scalar
object". The increment and the assignment operator both have side
effects on i (storing new values).

No, this is from a feature Standard you have not yet read. It's not C++03.

Have a good day.
 
J

Johannes Schaub (litb)

Pete said:
That's good, because that's what was intended.

I have noticed that C++0x has changed one example for C++03's "undefined
behavior rule" from

i = ++i + 1; // C++03

to

i = i++ + 1; // C++0x

Using the analysis applied in this thread, we found that the wording
(anadvertedly?) makes the first well-defined. The second is still undefined.

What was the rationale to change that example from C++03?
 
B

Bo Persson

Johannes said:
No, this is from a feature Standard you have not yet read. It's not
C++03.

I have read it :), but anyway the original text is

"Between the previous and next sequence point a scalar object shall
have its stored value modified at most once by the evaluation
of an expression."

which makes no difference for single threaded execution.


Bo Persson
 
J

Johannes Schaub (litb)

Johannes said:
I have noticed that C++0x has changed one example for C++03's "undefined
behavior rule" from

i = ++i + 1; // C++03

to

i = i++ + 1; // C++0x

Using the analysis applied in this thread, we found that the wording
(anadvertedly?) makes the first well-defined. The second is still
undefined.

What was the rationale to change that example from C++03?

Subsequently, I have looked up the reason myself. DR #637 and DR #222 show
that the examles shown are intended to be well defined in C++0x and
undefined in C++03.

I'm glad that I have sorted this out before doing an issue report to WMM.

Now, let's wait for "James Kanze" to tell me how deeply wrong I am about any
and everything :)
 
J

jl_post

let's consider i = v[i++]. the side effect of i being incremented by 1
is SEQUENCED before the side effect of i being assigned v[i++],
because "The value computations of the operands of an operator are
sequenced before the value computation of the result of the operator".
So how come is this undefined behavior?


I know I'm a latecomer to this thread, but maybe these explanations
will help.

========================

A C++ compiler is perfectly free to re-interpret a pre-increment
operation like this:

int a = ++i;

as:

i += 1;
int a = i;

========================

Likewise, a C++ compiler is perfectly free to re-interpret a post-
increment operation like this:

int a = i++;

as:

int temp = i; // a copy is made
i += 1; // the original is incremented
int a = temp; // the copy is assigned

or as:

int a = i;
i += 1; // no copy is made

Both are valid, and the second has the advantage that it is not any
less efficient than the pre-increment (which goes against the popular
belief that post-increment is necessarily less efficient than pre-
increment).

========================

In the same manner,

i = v[i++]; // undefined!

can be re-interpreted as:

int temp = i;
i += 1;
i = v[temp];

(which results in what you'd expect)

or as:

i = v;
i += 1;

(which is certainly not what you'd expect)

========================

Also in the same manner:

// (Let's assume i equals 1 here.)

int a = i++ + i++; // undefined!

can be re-interpreted as:

int a = i + i; // a would now be 2
i += 1;
i += 1;

or as:

int temp1 = i;
i += 1;
int temp2 = i;
i += 1;
int a = temp1 + temp2; // a would now be 3

========================

All of these re-interpretations are allowable to C++ compilers. But
as you can see, they don't always yield the same results in cases
where the variable is read twice in an expression where it's also
written to. Therefore, in those cases you get undefined behavior.

I hope this clears things up, Armen.

-- Jean-Luc


========================


P.S. Here I'll put a disclaimer:

According Bjarne Stroustrup's FAQ at http://www2.research.att.com/~bs/bs_faq2.html#evaluation-order
, these lines:

v = i++;

and:

f(v, i++);

are undefined. However, it says nothing of the following line:

i = v[i++];

so I'm not positive that's undefined. It says that "if you read a
variable twice in an expression where you also write it, the result is
undefined."

But in "i = v[i++];", is the "i" variable read twice? I'm not sure.
If it is, then the result is undefined, but if it isn't, then the
result should be defined. I have to admit that I really don't know
about this one.
 
B

Bo Persson

let's consider i = v[i++]. the side effect of i being incremented
by 1 is SEQUENCED before the side effect of i being assigned
v[i++], because "The value computations of the operands of an
operator are sequenced before the value computation of the result
of the operator". So how come is this undefined behavior?


I know I'm a latecomer to this thread, but maybe these explanations
will help.

========================

A C++ compiler is perfectly free to re-interpret a pre-increment
operation like this:

int a = ++i;

as:

i += 1;
int a = i;

========================

Likewise, a C++ compiler is perfectly free to re-interpret a post-
increment operation like this:

int a = i++;

as:

int temp = i; // a copy is made
i += 1; // the original is incremented
int a = temp; // the copy is assigned

or as:

int a = i;
i += 1; // no copy is made

Both are valid, and the second has the advantage that it is not any
less efficient than the pre-increment (which goes against the
popular
belief that post-increment is necessarily less efficient than pre-
increment).

The belief is that pre-increment is at least as good as
post-increment. Therefore it should be the default, if there is no
other reason to select either.
P.S. Here I'll put a disclaimer:

According Bjarne Stroustrup's FAQ at
http://www2.research.att.com/~bs/bs_faq2.html#evaluation-order
, these lines:

v = i++;

and:

f(v, i++);

are undefined. However, it says nothing of the following line:

i = v[i++];

so I'm not positive that's undefined. It says that "if you read a
variable twice in an expression where you also write it, the result
is
undefined."


It also could say that you cannot write to a variable twice without an
intervening sequence point.


Bo Persson
 
J

jl_post

The belief is that pre-increment is at least as good as
post-increment. Therefore it should be the default, if
there is no other reason to select either.


While I definitely belive that that is a common belief, I have to
disagree with it. According to some, postincrement is faster than
preincrement on certain platforms. (Someone on
http://discuss.fogcreek.com/joelonsoftware/default.asp?cmd=show&ixPost=171881
makes this claim for the Motorola 68000.)

I think the reason this belief is so widespread is because most
programmers, if writing their own implementations, would create a
temporary copy for postincrement, while they would not bother with a
copy for preincrement. But in fact, there is nothing to say that
postincrementing a C integer (such as int, char, short, etc.) requires
a copy to be made, and conversely, there is no guarantee that
predecrementing can't make use of a temporary copy, or even several.

So while in theory pre-increment is at least as good as post-
increment, I have to disagree with it in practice. In practice, post-
increment can be as efficient, more efficient, or less efficient than
pre-increment. And the argument that one is at least as good as the
other because "that's the way I would do it" (or the mistaken claim
that "there is no way to do it without a temporary copy") isn't a good
one considering that there are platforms already in existence that go
contrary to this common belief. (I'm not saying you're using that
argument; but I am saying that I've heard it more than once from other
programmers as a way to avoid using post-increments altogether.)

In my opinion, i++ and ++i both have the same run-time behavior (in
big-O notation, they would both be O(1)), so it really doesn't matter
which one you use when measuring efficiency. A run-time bottleneck
won't ever disappear by replacing i++ with ++i (and vice-versa).
(Although I suppose it would be possible to see a difference if you
had a program that all it did was pre- or post-increment billions of
times and did absolutely nothing else (which I couldn't say would ever
be a very useful program). And even if you did, it's possible that
the post-increment version would be slightly more efficient than the
pre-increment version!)

You can probably tell that I've given a lot of thought about this,
and have defended myself against those who believe that "good
programmers never use post-increment." In practice, I think one
should use whichever is the one the situation is most suited for
(which I think you hold a similar position to), and ignore the
negligible efficiency difference. And when an integer is incremented
on a line all by itself (without it being assigned to anything), I
believe it doesn't matter at all whether you use i++ or ++i (in fact,
all the compilers I've tested that on compile both to the same
executable code, and even if they didn't, the run-time difference is
virtually nonexistent).

It also could say that you cannot write to a variable
twice without an intervening sequence point.

Yeah. Or even: In any line of code (defined as code delimited by
';', '&&', '||', or the comma operator), any variable that is pre- or
post- incremented/decremented should never be used more than once.
Otherwise, the result is undefined, and your refridgerator might
defrost. ;)

Yours,

-- Jean-Luc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,145
Messages
2,570,825
Members
47,371
Latest member
Brkaa

Latest Threads

Top