mdh said:
while ((array1[i++] = array2[k++]) != '\0')
Is this the order that this is evaluated?
Others started answering this question, but I think it is better to
back off and *not* answer this question, or at least, not "as asked".
Specifically, you asked about "the order that this is evaluated".
The problem here is that in C, many expressions do not *have* any
particular ordering (sequence of operations). The only thing that
is guaranteed about them is the *result*, and then only if nothing
goes wrong along the way.
[first] array2[k] is assigned to array1 ....???? the reason
being it is within parenthesis ???
Parentheses affect the "binding" of various operators.
I prefer this word to "precedence", because the word "precedence"
is built on the word "precede", which seems to say "happens first".
What you need to understand here is that there are two separate
things that happen:
- First, you compile your code.
- Then, later -- usually much later, considering computer speeds
are in the gigahertz range -- you run your code.
The parentheses only affect the compilation. The order of operations
happens later, when you run the code.
Compilers take "expressions" and (generally) build something called
a "parse tree". Consider a simple expression like:
a + b * c
There are two trees we can build from this. One looks like this:
*
/ \
+ c
/ \
a b
which means "compute a + b, and compute that times c", and the other
looks like this:
+
/ \
a *
/ \
b c
which means "compute a + (result of computing b times c)".
What parentheses do is let you control the *parse tree*, and (in
C at least) nothing else. By default, without parentheses, C is
required to build the second parse tree -- the "*" operator binds
more tightly than the "+" one, so
a + b * c
"looks like":
a + b*c
with the "b*c" part "close together" (tightly bound) and the "+"
parts "far apart" (loosely bound). If you add parentheses around
the "+" part, however, you can force it to bind more tightly:
(a+b) * c
Again, this only affects the parse tree.
Now consider something a little more complicated: instead of
simple variables a, b, and c, let us use three functions that
return values:
f() + g() * h()
and here they are:
int f(void) { puts("f() returns 5"); return 5; }
int g(void) { puts("g() returns 2"); return 2; }
int h(void) { puts("h() returns 3"); return 3; }
Now, we get f() + (g() * h()) by default, because the "*" binds
more tightly; but we can write (f() + g()) * h() to get the
first two to bind together over the "+", before the "*" can bind
to the rest. One way gives 5 + (2*3) = 5+6 = 11; the other way
gives (5+2)*3 = 7*3 = 21. But what happens later, when we actually
*run* the program?
Whenever f() gets called, it prints a line. Whenever g() gets
called, it prints a different line. Whenever h() is called, it
prints a third line. So we will be able to find out which order
the compiler actually used.
If we asked for (f() + g()) * h(), one way the compiler can
do this is to call f() first, then call g(), then add the two
results, then call h(), then multiply its result by the result
of the previous addition. That will print f()'s line first,
g()'s line second, and h()'s third. But another way to do it
is to call h() first, save that value, then call f(), then
call g(), then add, and only then multiply. Or the compiler
could call h(), then g(), then f(); or it could call f(), then
h(), then g(); or it could call g(), then h(), then f(); and
so on.
There is no constraint on the order of the *calls*, as long
as it gets the right result (21, in this case).
If we asked for f() + (g() * h()), the compiler still has total
freedom to call the three functions in any order, as long as it
multiplies the result of g() and h() and adds the result of f().
Again, there is no constraint on the order of the *calls*, as
long as it gets the right result (11 this time).
Thus, for the most part, the *runtime* order of *execution* is
not connected (or not strongly connected) to the *compile-time*
order of "parse binding" -- and parentheses only affect the
compile-time binding.
C gives us something formally called "sequence points" that
control runtime order of execution. See the FAQ for details;
but it is safe to assume that you hit a sequence point at the
semicolons between statements.
When you hit a sequence point, all the expressions that are "in
flight" have to "finish up": a compiler is kind of like a juggler
tossing eggs (instead of balls) in the air, and when you give it
a complicated expression with lots of [] and ++ and so on, it throws
a whole bunch of eggs into the air. They fly around in some
complicated fashion, but at the "sequence point", they all land
somewhere so that you can stop and look at them. Things go wrong
if two eggs collide: you get a big mess. This means "don't write
stuff like a[i++] = a[i++], which tosses two i++ eggs that collide".
With all that said, we can go back to the original, rather
complicated expression:
while ((array1[i++] = array2[k++]) != '\0')
This does 7 separate things at runtime:
- Schedule "i" to get incremented; temporarily save ("produce")
the value of "i" before any such incrementation (the "old" value).
- Schedule "k" to get incremented; produce the old value of "k"
- Using the old value of "i", find array1, as an object
(it is on the left hand side of an "=" operator so we will
store a new value here).
- Using the old value of "k", find array2[k], as a value
(it is on the right hand side of an "=" operator so we need
its value).
- Schedule a copy the value found into the location found
(changing array1[old-value-of-i] to array2[old-value-of-k],
that is).
- Produce, as the result of the "=" operator, the value that
was stored into the object on the left (array1[old i]).
Usually this is the same as the value retrieved from the
right (array2[old k]), but if the type of the object on
the left is different from the type of the value on the
right, it may not be the same.
For instance, if you copy a full-blown "int" value into an
"unsigned char", the top bits of the value get chopped off.
On a typical implementation, doing:
unsigned char x;
int y = 32767;
printf("(x=y) = %d\n", x = y);
will print "(x=y) = 255".
- Compare the result produced by the assignment to '\0', which
is just a way of writing the integer constant 0 while telling
the programmer reading the code that you mean "end of string".
The comparison will produce 1 if the result is not 0, and 0 if
the result is 0, so the loop will run "while not end of string"
(I say "end of string" instead of "zero" because I assume the
'\0' means "end of string" -- these may be the same thing
underneath, but they have different "intentions", as it were;
much like accidentally cutting yourself with a knife does not
mean you intended to commit suicide ).
By my count, then, the compiler has 7 things ("eggs") flying through
the air all at once in this expression: i and k are being incremented,
their old values are being used to locate and evaluate array[something]
and array2[something] respectively, the value at array2[something]
is being copied to the object named array1[something], the result
of that copy is being saved temporarily, and that result is being
compared against "end of string". All seven of those things have
to "not collide" -- and in fact they do not, because "i" and "k"
are different variables and "array1" and array2" are different from
each other and from both of those, and all the rest of the things
listed above are "values", and values never collide (only objects
can collide).
(I could re-count and get 6 or 9 things above, of course. Notice
that for i++ and k++ I listed "one thing" even though each one was
two parts: "schedule increment" and "produce old value". Then,
later, I listed "two things" for the "x = y" type assignment:
schedule the assignment to happen, and produce a value. I only
split it up because the "produce a value" part was so long.)
All the "scheduled" items above happen by the next sequence point.
There is a sequence point after the "full expression" in a testing
statement (an "if" or "while", or the middle part of a "for" loop
-- actually all parts of the for loop have sequence points, but
only the middle part gets tested). So by the time you get into
the body of the "while" loop (if there is one), i and k have been
incremented and array1[old value of i] has been changed. If the
loop body is empty, the sequence point ensures that all of this
has happened before the "while" tries again.
As a last comment:
-> if FALSE exit loop and neither k nor i is incrmented.
This is wrong: the expression schedules i and k to get incremented,
schedules array1[something] to get updated, and tests the result
that will wind up in array1[something]. (The test can happen
before, during, or after the actual update; again there is no real
constraint on the order of operations here.) Then there is a
sequence point and now *all* scheduled changes *have* happened;
only then does the while loop either run, or not run.
Thus, when the loop does in fact exit, i and k *have* been
incremented and array1[previous value of i] has been changed,
set to '\0'. If you want to leave i and k unchanged, you have
to rewrite the loop:
while ((array1 = array2[k]) != '\0') {
i++;
k++;
}
Here the compiler does *not* schedule updates to i and k as
part of the "copy value to array1[something] and test result"
loop-test. The sequence point after the full expression
merely updates array1 -- and we can just talk about
array1 here because i is unchanged -- and only if we get
into the loop itself do we find a statement that schedules
an update to "i":
i++;
Here we "schedule an increment and produce the previous value",
but then we ignore the value by hitting the semicolon ending the
statement, which also enforces a sequence point. Thus, i gets
incremented, and nobody cares about the value, so the compiler does
not actually have to *save* the old value anywhere, and a good one
will omit the extra code needed for that. A crappy compiler might
produce extra code to save the old value. This is one reason you
find people advising other people to write ++i instead of i++: back
in the early 1980s, some compilers really *were* this bad, and
produced shorter, faster code with ++i instead of i++. These days,
with GCC producing good fast code either way, there is no excuse
for a compiler to produce slow code for i++. Write whichever
one you find clearer, if the value is not important.
(If the value *is* important, of course, you have to write the
correct one: a[++i] is very different from a[i++], since ++i not
only schedules the increment, but also produces the value you will
have *after* incrementing, instead of the value you did have *before*
incrementing. Note that in most cases, a good compiler can produce
short, fast code for either one -- again the main reason to pick
one or the other is based first on the result you need and then on
the way you prefer to write it, not the code that comes out of the
compiler.)
If all that was not enough, let me make one more point.
Expressions do two things in C: produce values (always), and cause
"side effects" (optional). "Side effects" are things like changing
the values stored in variables, or -- at least formally in langauges
other than C -- producing output. (In C, output is hidden behind
functions that use Implementation Magic, so you cannot really "see"
the "true" expressions that do the trick.) The value of any
expression always has a "type", and the type determines the range
of possible values: "unsigned char" is in [0..UCHAR_MAX], "int" is
in a wider range including negative numbers, "double" handles
fractional parts, and so on; and the special "void" type is the
empty set of values (slightly peculiar, but still considered "a
value"). When you write an expression as a statement, you throw
away the final value, so:
i = j;
has a value that gets thrown away, as does:
printf("the answer is %d\n", i);
In both cases, the reason for throwing away the value is that you
really only wanted the side effect: changing i, and producing
output. Good compilers generally warn you if you write, as a
statement, an expression that has *no* side-effects:
i + 12;
since it makes no sense to compute a value and then throw it away
if you are not also doing *something* else.