what is the right value about a = a++;

S

Stephen Sprunk

isn't that the other way round

the statement is undefined
=>
compilers are permitted to produce different answers
(or trap or crash or anything else weird the compiler writer feels
like)

Compilers existed prior to the standard, so their divergent behavior
could not possibly be blamed on a non-existent standard allowing it.

AIUI, the job of the ANSI committee was to document the language as it
existed, warts and all, rather than to create a new and perfect language
from scratch. They did overstep that boundary on occasion, eg.
prototypes and (void *), but did so in ways that didn't break any
existing code.

As it turned out, different compilers treated "a=a++;" differently, and
the committee (rightly, IMHO) decided to allow them to continue with
that rather than break existing code that depended on a particular
compiler's ex post facto outlawed behavior.

While I've never taken an exhaustive inventory, I suspect most undefined
behavior in the standard could be traced to such a scenario.

S
 
S

Stephen Sprunk

One particular choice of behavior stands out: the simple, recursive rule:

- evaluate each subexpression completely, includiing all of its side effects,
before evaluating the parent expression.

- if an expression has several subexpressions, evaluate them left to right.

Nice of you to dictate that people using systems that more efficiently
evaluate sub-expressions from right to left should suffer performance
degradation of all code, even in the majority of cases where the code is
well-written and therefore the order doesn't actually matter, simply
because it doesn't suit your sensibilities.

If you /need/ to know that "foo(bar(), baz());" calls bar() before
baz(), then just write "tmp=bar(); foo(tmp, baz());". There is no need
to suffer that performance penalty, though, in the vast majority of
cases where it _doesn't_ matter.

S
 
J

James Kuyper

Compilers existed prior to the standard, so their divergent behavior
could not possibly be blamed on a non-existent standard allowing it.

True; but the fact that different current compilers give different
results cannot be used to validly infer that the behavior is undefined.
The behavior could be implementation-defined, or simply unspecified, or
one of the compilers you're comparing could be non-conforming.
AIUI, the job of the ANSI committee was to document the language as it
existed, warts and all, rather than to create a new and perfect language
from scratch.

It was neither of those things, and both. The committee was tasked with
creating a language specification which was supposed to honor many
different competing principles; some of the hardest work the committee
had to do was balancing those different principles. Some of those
principles, if taken in isolation, could have been interpreted as
calling for a "perfect" language. Other principles called for retaining
compatibility with existing practice.

Incidentally, it was made explicitly clear that "existing code is
important; existing implementations are not". I would presume that was
based upon the fact that, for any reasonably successful implementation
of C, the amount of code written for use on that implementation greatly
outweighed the amount of code required to implement it.

That principle became less important when existing practice was in
conflict with itself. The committee could have specified one particular
choice of behavior, rendering all implementations that didn't make that
choice non-conforming. They could have left the behavior unspecified,
with the range of permitted options wide enough to cover the variety of
existing practice. They could have declared that the behavior was
implementation-defined, requiring a conforming implementation to
document what behavior it chose. Or they could have declared the
behavior undefined, which by definition is compatible with anything that
any compiler might choose to do with the code. The committee did all
four of those things, in different contexts.
 
S

Stephen Sprunk

Well, no, he isn't saying any such thing. You just made it up. The
fact that evaluation order in C is left to the compilers has very
little to do with optimization; it is a historical artifact that came
from endorsing existing practice when the ANSI standard was put
together. Existing practice then was that compiler writers could do
evaluations in whatever they pleased and they did.

No, they did evaluations in the order that made the most sense (i.e. had
the best performance) for their particular implementation--and
implementations vary.

For instance, varargs is more straightforward to implement with
right-to-left evaluation of arguments, at least with a stack-based
calling convention. Order of evaluation may not matter for
register-based calling conventions, but if an implementation uses both,
it makes sense to use the same order.

S
 
S

Stephen Sprunk

Depends on which way your stack grows - with downward growth it's
usually easiest to evaluate varargs right-to-left, but for an upwards
growing stack left-to-right is usually easier.

How would stack growth direction matter? What matters for varargs is
that the first argument is on the top of the stack, regardless of which
way it grows. If you push the arguments from right to left, the first
argument ends up on top. You _could_ evaluate the arguments from left
to right, store them in registers or elsewhere in memory, and then push
from right to left, but why go through that extra effort?

(This, of course, assumes a stack-based calling convention for varargs
calls, as all of the ABIs I'm familiar with dictate, and possibly for
other calls.)

S
 
S

Stephen Sprunk

You are not disagreeing with me.

I disagree that they did "whatever they pleased"; they did what was best
for their particular implementation, for various definitions of "best".
Well, yes, I know that. However what you are talking about is a minor
micro-optimization at most.

It doesn't seem so minor, particularly for platforms where _every_ call
uses a stack-based calling convention. Remember, we're talking about
long ago when compilers were rather dumb and most popular architectures
were register-starved. That is the environment that ANSI was faced
with, and it would have been unreasonable to force the majority of
implementations to suffer a performance penalty for the questionable
benefit of left-to-right evaluation.
Historically the variations weren't a matter of performance - for the
most part the performance of the early compilers was less than
stellar.

Of course; that was the state of the industry at the time. All the more
reason not to make things worse.
The various choices in calling sequence conventions and in
parsing order were a mixture of convenience and ad hoc decisions. The
order variations were an unalterable part of the language by the time
it came to standardization.

I think we can agree on this much: by the time ANSI looked at the
matter, it was too late to "fix" many of the behaviors that ended up
being called undefined, unspecified or implementation-defined because
doing so would have broken a lot of code that implicitly depended on a
particular implementation's pre-standard behavior.
In any event the idea that "most of C's optimization abilities" depend
upon not restricting evaluation order and side effect order is a wild
exaggeration.

Optimization doesn't depend entirely on that particular gray area, but
gray areas in general provide implementations leeway to make more
aggressive optimizations because they lessen the impact of the "as if"
rule. Allowing the meaning of "a=a++;" to change isn't helpful in
itself, but it may help the compiler optimize _something else_.

Also, most of those gray areas aren't terribly useful in the first place
and avoiding them generally results in clearer code anyway. If _I_
can't figure out what "a=a++;" is supposed to mean, without possibly
resorting to the Standard, it's not important to me that the compiler
knows either because I won't write it in the first place! My primary
audience is human, not machine.

S
 
E

Edward A. Falk

Why C Standards Committee does not constrain this case.

Because it's a dumb case. You're not supposed to write that idiom,
and pointless constraints are a bad thing in a system where performance
is king.
 
E

Edward A. Falk

Well, no, he isn't saying any such thing. You just made it up. The
fact that evaluation order in C is left to the compilers has very
little to do with optimization; it is a historical artifact that came
from endorsing existing practice when the ANSI standard was put
together.

Perhaps, but it can still have a lot to do with optimization.

Consider:

b = (sqrt(sin(x)*cos(x))) * a;

If the compiler can guess that a is very likely to be zero, and that the
left side is very likely to be expensive, and has no side effects, then
it can generate code that evaluates a first, and then decides whether
or not to evaluate the left side.

I realize that this is a contrived example, but the bottom line is
that C was intended to be a *fast* language, and not a *safe* language.
Why nail the compiler's foot to the floor if you don't have to.

(When I first heard about the C language, it was described to me
as a substitute for assembly language. I think that's the proper
attitude to take when deciding between fast and pedantic.)
 
K

Keith Thompson

Perhaps, but it can still have a lot to do with optimization.

Consider:

b = (sqrt(sin(x)*cos(x))) * a;

If the compiler can guess that a is very likely to be zero, and that the
left side is very likely to be expensive, and has no side effects, then
it can generate code that evaluates a first, and then decides whether
or not to evaluate the left side.

I realize that this is a contrived example, but the bottom line is
that C was intended to be a *fast* language, and not a *safe* language.
Why nail the compiler's foot to the floor if you don't have to.

One problem with that contrived example is that the sqrt(), sin(), and
cos() functions can have the side effect of updating errno (if
math_errhandling & MATH_ERRNO is nonzero).
(When I first heard about the C language, it was described to me
as a substitute for assembly language. I think that's the proper
attitude to take when deciding between fast and pedantic.)

I don't disagree. But there's a common misconception that C *is* an
assembly language (its' been called a "high-level assembler). The key
difference is that an assembly language program specifies CPU
instructions, while a C program specifies behavior.
 
K

Kaz Kylheku

]
I don't want diagnostics; I want strict left to right evaluation order
of operands and subexpressions that complete their side effects before
yielding a value.

So, what you are really saying, is you want to eliminate most of C's
optimization abilities, by restricting the order in which the expression
*must* be evaluated, and side effects *must* be completed?

I do not believe that this is where "C's optimization abilities" come from
at all.

(Many of the optimization abilities of some early compilers from circa 1980
may have come from that, however.)

Do not forget the "as if" principle. I would want the above in the abstract
semantics, not in the actual semantics, of course.

Remember, C already has some constraints on evaluation order, like sequence
points between statements. Do you believe that sequential statements defeat
optimization? Do you think that i++; i++ is slower than i += 2 because
it calls for i to be updated twice? (Assume non-volatile-qualified i.)
 
K

Kaz Kylheku

]
I don't want diagnostics; I want strict left to right evaluation order
of operands and subexpressions that complete their side effects before
yielding a value.

So, what you are really saying, is you want to eliminate most of C's
optimization abilities, by restricting the order in which the expression
*must* be evaluated, and side effects *must* be completed?

Well, no, he isn't saying any such thing. You just made it up. The
fact that evaluation order in C is left to the compilers has very
little to do with optimization; it is a historical artifact that came
from endorsing existing practice when the ANSI standard was put
together. Existing practice then was that compiler writers could do
evaluations in whatever they pleased and they did.

Exactly! Evaluation order could not have been standardized at the time
C was standardized for the first time, simply because it would have required
most compilers to change something. This is not the proper business of
a standard (to invent new stuff, that is). The proper business is to
identify the commonality in what is out there and codify it.

But, evidently, it IS now the business of C standardization to invent new crud.
So why don't they put a priority on tying loose ends left over from
the 1980's?

Before they started introducing crud like variable length arrays, they should
have revisited the behaviors that could not have been pinned down in 1989, and
tightened them up.

The problem is, I suspect, that some people have their egos invested in this
idiotic sequence point stuff. To throw it out the window would be a huge
backpedaling. In C99 they just about nearly put out an new informative annex
which introduced a formal model of sequence points, which polished this turd
into a foot-deep shine.
 
K

Kaz Kylheku

No, they did evaluations in the order that made the most sense (i.e. had
the best performance) for their particular implementation--and
implementations vary.

Old compilers had hard-coded syntax-directed translation behaviors that were
convenient for the implementor. For instance a function call could be
handled as a single unit emitting some boiler-plate expansion.
For instance, varargs is more straightforward to implement with
right-to-left evaluation of arguments, at least with a stack-based
calling convention.

You don't see that you're switching topics from "best performance" to
"straightforward to implement"?

The order matters only for arguments that have side effects. Something like
f(a,b,"foo",x+y) can be evaluated in any order, even if the abstract semantics
calls for strict left to right.

If there are side effects, the order is required so that the same result
is reproduced no matter what. The code generated by the compiler from
a straightforward expression like f(i++, i++) should be no slower
than from an explicit refactoring like
{ int temp0 = i++, temp1 = i++; f(temp0, temp1); }.
Order of evaluation may not matter for
register-based calling conventions, but if an implementation uses both,
it makes sense to use the same order.

Speaking of stacks and registers, machine languages by and large well-defined
evaluation order, so it's laughable for a "higher level assembly" to screw this
up.

There have been some historic oddballs like early RISC processors that blindly
fetched instructions in sequence even though a branch is happening, requiring
the compiler or assembler to fill the delay slots sensibly or stick in NOPs.

It's the same theme theme like unspecified evaluation order in C: unsafe hack
for ease of implementation.
 
K

Kaz Kylheku

How would stack growth direction matter? What matters for varargs is
that the first argument is on the top of the stack, regardless of which
way it grows. If you push the arguments from right to left, the first
argument ends up on top. You _could_ evaluate the arguments from left
to right, store them in registers or elsewhere in memory, and then push
from right to left, but why go through that extra effort?

Or you could wake up and realize that the stack can be indexed like an array,
unless you're working with a target language that really has only push and
pop, with no other access to the stack. (Good luck implementing local
variables efficiently, etc).

You can allocate the space for N arguments, and then store valuews into
that stack space in whatever order is convenient, while obeying
the abstract evaluation order (in those situations where it potentially
makes a difference).
 
K

Kaz Kylheku

On 2/7/2012 3:15 AM, Kaz Kylheku wrote:
On Monday, February 6, 2012 8:10:00 AM UTC, lujnan wrote:
b = a = a++;
[...]
I don't want diagnostics; I want strict left to right evaluation order
of operands and subexpressions that complete their side effects before
yielding a value.

So, what you are really saying, is you want to eliminate most of C's
optimization abilities, by restricting the order in which the expression
*must* be evaluated, and side effects *must* be completed?

Well, no, he isn't saying any such thing. You just made it up. The
fact that evaluation order in C is left to the compilers has very
little to do with optimization; it is a historical artifact that came
from endorsing existing practice when the ANSI standard was put
together. Existing practice then was that compiler writers could do
evaluations in whatever they pleased and they did.

Well, not having access to the Standards committee, I can only speculate.
However, if evaluation order were the only thing at stake here, things would
have been left as "unspecified" rather than "undefined".

Last I looked, the order is in fact unspecified. But then what is undefined
is the multiple uses of the same object.

So for instance f(g(), h()); is not undefined behavior, but unspecified.
We don't know whether g is called first or h, but there are only two
possibilities.

In /practice/ there is no difference.

Unspecified, undefined: it all translates to hidden bugs in code that are not
caught by your regression test suite---even if that code has 100% coverage!
In the 30+ years of programming in C, I don't believe I've ever had the need
to write code which depended on side-effects being completed in "the right
order" within a single statement.

I.e. you have written plenty of code with multiple side effects, but you're
sure that you didn't botch it.
Whether that's because I know that
there's no such thing as "the right order", or because I never came across a
scenario where that might be useful, I cannot say, but I'm leaning towards
the latter.

I'm not so confident. After 30 years of coding, I'm still making coding
mistakes (even syntax!). I catch these with the help of the compiler, by
reviewing code, by test coverage, or sometimes with the help of other people.

These evaluation order issues fall into a category that is not caught
by tools, and not even by perfect test coverage, because a test case
could simply validate that the expected behavior is happening on the
given target machine and compiler.

I can't justify a belief that the mistakes I make are always, by good luck, in
categories that are caught by the compiler, or by test coverage.

Of course I haven't /knowingly/ stuffed multiple side effects into an
expression such that their order is critical to the correct outcome, for what
that is worth.
 
K

Kaz Kylheku

Perhaps, but it can still have a lot to do with optimization.

Consider:

b = (sqrt(sin(x)*cos(x))) * a;

If the compiler can guess that a is very likely to be zero, and that the
left side is very likely to be expensive, and has no side effects, then
it can generate code that evaluates a first, and then decides whether
or not to evaluate the left side.

Since a is not volatile and those transcendental functions have no side
effects, this evaluation order would be allowed even if the evaluation order
for A * B is that A is evaluated first.

Why don't we change this to:

b = sqrt(sin(x)*cos(x)) && a;

Or how about:

b = sqrt(sin(x)*cos(x)), a;

Both && and , have ordering properties today.

If a is zero, do you believe that the trigonometry has to be evaluated?
I realize that this is a contrived example, but the bottom line is
that C was intended to be a *fast* language, and not a *safe* language.

Fast to implement, sure.
Why nail the compiler's foot to the floor if you don't have to.

So that you don't shoot through yours.
 
K

Kaz Kylheku

Nice of you to dictate that people using systems that more efficiently
evaluate sub-expressions from right to left should suffer performance
degradation of all code, even in the majority of cases where the code is
Proof?

well-written and therefore the order doesn't actually matter, simply
because it doesn't suit your sensibilities.

In the majority of cases, C code is not well-written. The history of C
programming at large is riddled with crashing applications and operating
systems, and exploitable security holes in online systems.
If you /need/ to know that "foo(bar(), baz());" calls bar() before
baz(), then just write "tmp=bar(); foo(tmp, baz());". There is no need

Thanks for the clueful suggestion, but, see, what I need to know first is
WHERE this is hiding in a large code base.
 
J

John Bode

@spamcop.net>  wrote:
[snip]

Consider the fact that, in a "well behaved" program, the order of evaluation
doesn't matter.  Consider, too, that an optimizer may change the order of
evaluation to something more "efficient" on a given target platform.

The truth of the matter is, is that when, precisely, a side effect most
efficiently takes effect, is very platform-dependent, and can even vary from
expression to expression on a given platform.

Is this anything more than received wisdom? Does anyone have a
concrete example (read: generated machine code) showing *how*
rearranging evaluation order can make an operation more efficient?

I mean, I've parroted that line more than once, but I certainly can't
point to an example of where it makes a difference.
 
K

Kaz Kylheku

Because it's a dumb case. You're not supposed to write that idiom,

You're only not "supposed" to write that only because the behavior isn't
standardized, so this is circular reasoning.

You were also not supposed to write this, prior to 1999:

struct foo x = { .member = 42 };

nor this:

{ int x[var]; }

C programmers obviously sometimes do what they are not supposed to, otherwise
we would not need outfits like the Computer Emergency Response Team (CERT).

You may be a perfect C programmer who never does what isn't supposed
to be done, but that's a sample of one, and highly biased.

I'm not such a programmer. All I can say is that I have never knowingly
broken the rules. I do not buy arguments that "I have never needed that
behavior to be well-defined" or "I have never made such mistakes".

I do not buy the argument that the language should be unsafe, and programming
should be done only by those who can avoid stepping on the mines. On a
project, it's good to have one such savant, but not the whole team.

Why should only programming language users be careful and responsible, but not
language designers?
and pointless constraints are a bad thing in a system where performance
is king.

The constraint only affects situations where expressions have interacting side
effects. All other code has the same meaning as without the constraints.
 
J

James Kuyper

Is this anything more than received wisdom? Does anyone have a
concrete example (read: generated machine code) showing *how*
rearranging evaluation order can make an operation more efficient?

As a undergrad, I once had a summer job where I was responsible for
writing some hand-optimized assembler to replace two C subroutines that
had been found to be a bottleneck in a scientific calculation. I no
longer remember the concrete details you're asking for, but I do
remember the general strategies I used, and they all depended upon
rearranging the evaluation order.

On the machine I was writing for, there was one component that could
handle either a division or a sqrt() operation in parallel with
multiplications and additions. The main loop required one division,
followed by a sqrt(), followed by another division. Division and sqrt()
both took much longer than either a multiplication or an addition, so
one of the first things I did was rearrange evaluation so the division
started as early as possible, and was immediately followed by the
sqrt(), which was then followed by the second division, and I rearranged
things so that as many multiplications and additions as possible were
done between the start of the division and the completion of the sqrt().

Another key aspect of this machine was a component that could carry out
the register-based equivalent of fma() in three clock cycles, and could
be working on three different fma() operations at the same time, if they
were each started on a different clock cycle. Because the subroutine was
carrying out matrix multiplications, I could easily arrange to keep that
component busy almost full time, but doing so required rearranging the
order of evaluation extensively.

Finally, one of the key things I could do which the compiler was not
able to do, was to treat three different registers as the elements of a
3-component array. This helped to drastically reduce the number of slow
load and store operations, but it meant that a naive implementation of
the subroutine would have required three times as many registers as were
actually available. I dealt with this by rearranging the order of
evaluation so that the each register could be used to represent the
value of several different variables during the course of the
subroutine, by making sure that the periods of time during which the
register represented each variable didn't overlap. IIRC, if there had
been even a single additional register, I could have completely avoided
using RAM for any of the intermediate steps; as it was, I think there
was only one load and one store per pass through the main loop.

It was my first and last project writing in that assembly language. I
documented as accurately as I could precisely how I had designed it, but
I'm sure it was almost completely unmaintainable. My supervisor expected
me to spend the entire summer on it. I completed it in a couple of
weeks, and sped up the subroutine by a factor of 3.5, which disappointed
me - I'd been expecting a factor of 5. But the relevant subroutine was
going to be keeping hundreds of processors busy with a nearly 100% duty
cycle for several years, so that speed-up easily paid for the cost of my
salary. And most of it would have been impossible if I'd not been free
to rearrange the order of evaluation.

All of my rearrangements conformed to the as-if rule, however, so I'm
not sure they're directly relevant to this discussion. The real issue is
whether declaring certain kinds of behavior undefined makes it easier to
optimize code by mechanisms that would violate the as-if rule if the
behavior were defined. I believe this is the case, but I don't have any
anecdotes that demonstrate that fact.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,083
Messages
2,570,591
Members
47,212
Latest member
RobynWiley

Latest Threads

Top