On 13/11/13 16:02, Rosario1903 wrote:
[...]
The compiler has to generate code /as if/ it followed the ordering in
the source code and the precedence rules for operators. But it can
re-arrange the /actual/ generated code, as long as the effect is the
same. Since any effect on flags is not visible in C or C++ in a
calculation like "a * b + c", the compiler does not have to consider it
when arranging instructions. If the "readOverflowFlag()" is an inline
assembly function, the compiler will probably treat it like a volatile
access - and obey its ordering with respect to other volatile accesses,
to function calls that /may/ have unknown effects, code that /may/ cause
an exception, etc. But simple arithmetic using local data can usually
be moved around quite freely.
That's true as far as it goes. Nothing in the condition code is
"observable" in C++. On the other hand:
-- any calculations that would set the overflow flag would
result in undefined behavior in C++, so the compiler can
assume that they won't occur, and
-- the compiler also knows that the condition code is not
observable, and may rearrange code so that the overflow bit
gets set in an intermediate results, as long as it knows
that the final results will be correct. (For example, given
a + b + c, where a is a large positive number, and b and
c fairly large negative values. Executing a + b, and then
the results + c, will not overflow. If the compiler knows
that the hardware rounds results, it may execute
b + c first, then add a, even if b + c results in an
overflow.)
I don't think there is any sort of undefined behaviour here - it is just
that C and C++ does not consider effects on flags as "behaviour" of
arithmetic instructions.
Overflow is undefined behavior. Flags or whatever have nothing
to do with it.
The code Rosario posted used unsigned integers (I'm assuming that's what
he meant with "u32"), so if I've interpreted the C++ standards
correctly, "a * b + c" cannot overflow as the operations are defined as
modulo 2^32, and so no undefined behaviour. The cpu's overflow flag may
still be set.
Even if the unsigned operations don't overflow, the CPU's
overflow flag may be set. Typical CPU's don't distinguish
between signed and unsigned operations; they just add. The
carry flags signals overflow if the operation is treated as
unsigned, the overflow flag if it is treated as signed. It's up
to the assembler programmer (or the compiler) to decide what he
wants to do with these flags, if anything.
"Overflow" as defined by C++ and "overflow" in a typical cpu flag
register are therefore somewhat orthogonal concepts, I think.
Not really. They are both based on a more general concept of
overflow. The hardware could call the flags signed overflow and
unsigned overflow if it wanted. The reason it probably doesn't
is that for the most part, the only time a programmer will use
unsigned arithmetic is either for bit manipulations, when
overflow doesn't have any really well defined meaning, or for
multiple precision arithmetic, where the low order elements are
handles as unsigned, and their "overflow" is the carry for the
next higher elements.
If only that were true! Developer writing this type of code are mostly
doing so because they believe it is correct, having seen lots of
examples in books, websites, and example code from their toolchain
vendor or microcontroller manufacturer.
I'm not saying that there aren't any incompetent programmers
around, but the only people I've seen who write this sort of
thing know very well that it isn't portable. It's quite correct
for the platform they are developing for, and that's what
counts.
It is common that compilers /do/ generate code in the order the user
wants,
What does the user want? It's actually very, very rare for a
compiler to generate code in a purely left to right order, as a
naïve programmer might expect (and Java requires). Things like
using Sethi-Ullman numbers to choose which branch of the parse
tree to generate first were wide-spread practice back when I was
writing compilers, twenty-five years ago. Today, instruction
ordering plays an important role, and even the most trivial
peephole optimizer will reorder operations considerably.
but I have never seen it documented or guaranteed in any way.
(Of course, there are lots of embedded toolchains around - I have used
quite a few, but it's still just a tiny percentage - so I can't
generalise too much.)
For a lot of code, there is no particular reason for the compiler to
generate code in a different order - and compilers generally order the
output based on the source code if there is no good reason to do
something else.
There are a very large number of reasons why a compiler might
reorder code.
Don't forget, too, that modern processors don't always execute
the machine instructions in the order they occur, and
particularly, they often reorder memory accesses (or suppress
some completely) even when the machine instructions are in the
order you might expect.
But the fact that it mostly works makes it more surprising to people
when they hit the unusual cases.
Only if they are incompetent.` (Incompetent to program at this
level, of course. I currently work with some world class
programmers... in numerical analysis; they probably wouldn't be
competent to program at this level.)
I think the compiler guarantees that it will send the write instructions
(since "globalInterruptEnable" is volatile). But there is no guarantee
that the /processor/ and/or memory caches and other hardware will pass
the writes on. In more sophisticated cpu's with write buffers could
easily combine the two writes before it finally hits memory. And
certainly they don't guarantee any ordering between writes to
"globalInterruptEnable" and "partA" and "partB".
Exactly. For the code you post to work, the compiler would have
to generate barriers or fences for all of the volatile accesses.
That's an aspect that many programmers might not be aware of,
but all programmers should certainly know that volatile only
affects the variable it is attached to, and the the compiler is
free to move accesses to other variabes accross the volatile
access.
When using targets with that sort of caching and buffering, you need to
set things up to ensure that writes like "globalInterruptEnable" are
marked appropriately to avoid such problems - clearly such a flag has a
specific address in the memory map rather than being allocated in ram.
And usually you also need to add extra memory synchronisation
instructions - your interrupt control is by function, not just a single
write.
But for smaller and simpler cpus, people write code like that - the
hardware will apply the writes in the order given by the instruction scheme.
Yes. In the past, it was a lot simpler. The compiler finished
one instruction before starting the next, and all reads and
writes were directly to memory---no caches and no pipelines.