Zero overhead overflow checking

D

Dik T. Winter

> John Nagle a écrit : ....
>
> Imagine that. 50% of the programs at that time were producing
> incorrect results in some situations!

You read that wrong. Not every program that was overflowing delivered
the wrong result! It can be that on at least some of the programs the
overflow was intentional.
 
D

Dag-Erling Smørgrav

Bart said:
Someone mentioned hundreds of embedded processors for each advanced
processor. I guess these must be all a little different.

I think you misunderstand me. I did not say that for every advanced CPU
model there are hundreds of embedded CPU models; I said that for every
advanced CPU that comes off the figurative assembly line, tens or
hundreds of embedded CPUs do so as well.

Your cell phone contains at least one GPP and one DSP (possibly combined
on the same chip). The same probably goes for your desk phone. The
cell towers that your cell phone talks to and the exchanges that your
desk phone talks to contain hundreds of GPPs and DSPs. If you have a
multifunction digital watch, chances are it contains a microcontroller.
Your car contains dozens of microcontrollers. Your alarm clock, your TV
set, your DVD player, their respective remote controls, your food
processor, your microwave oven, your dishwasher, your burglary alarm,
probably some of the sensors connected to it, your printer, your copier,
your web camera, your switch (at least if it's managed), your DSL
router... the list goes on.

Chances are many of those microcontrollers are similar (many are based
on the ARM7 or ARM9 architecture) if not identical, and chances are many
of those run one of a handful of real-time operating systems (VxWorks,
QNX, RTEMS) or even Linux or BSD, all of which are written mostly in C
(thus disproving Jacob's claim that C can't be implemented on Harvard
machines like the ARM9).

DES
 
B

Ben Bacarisse

Dag-Erling Smørgrav said:
Chances are many of those microcontrollers are similar (many are based
on the ARM7 or ARM9 architecture) if not identical, and chances are many
of those run one of a handful of real-time operating systems (VxWorks,
QNX, RTEMS) or even Linux or BSD, all of which are written mostly in C
(thus disproving Jacob's claim that C can't be implemented on Harvard
machines like the ARM9).

Small but important point: I don't think he said that. He said (over
in comp.std.c) that gcc was that "Harvard architectures are not
supported in gcc's conceptual model" which is not the same. I think
you posted a counter-example to that claim but as well, but we should
avoid putting words into others' mouths.
 
H

Hallvard B Furuseth

Keith said:
The standard says overflow on a signed integer computation
invokes undefined behavior, but overflow on conversion either
yields an implementation-defined result or (in C99) raises an
implementation-defined signal.

I've never understood why the language treats these two kinds of
overflow differently.

char *s;
int c;
while ((c = getchar()) != EOF) { ... *s++ = c; }

Have you ever written code like that?

This can "overflow on coversion" when char is signed. Yet C strings are
char, not unsigned char. That makes it a major pain in the *ss to
handle strings a formally correct way, too much so for my taste. I just
don't worry about it, trusting the market to isolate me from anyone
producing one's complement char implementations or whatever.
 
K

Keith Thompson

Hallvard B Furuseth said:
char *s;
int c;
while ((c = getchar()) != EOF) { ... *s++ = c; }

Have you ever written code like that?

This can "overflow on coversion" when char is signed. Yet C strings are
char, not unsigned char. That makes it a major pain in the *ss to
handle strings a formally correct way, too much so for my taste. I just
don't worry about it, trusting the market to isolate me from anyone
producing one's complement char implementations or whatever.

Ok, that's a good point. Common idioms like this depend on the
assumption that signed and unsigned chars are interchangeable,
even though the language doesn't support this assumption, and yes,
I've depended on that myself.

But that still doesn't quite explain the discrepancy. In C99,
the above code could easily blow up if the conversion raises an
implementation-defined signal. Even in C90, it could fail badly if
the implementatation-defined result is something odd (like, say,
if the result saturates to CHAR_MAX) -- though it could only fail
for character codes exceeding CHAR_MAX, and historically those were
somewhat unusual. If the behavior of the conversion were undefined,
the situation wouldn't be much worse than it already is.
 
B

Bart

     Okay, fine: C is inherently non-portable, is implemented on only
an insignificant handful of machines, and it takes years to port C
code from one machine to the next.  Useless, a failed language.

My comments were just personal observations. There does seem to be a
need to know the black art of compiler switches and makefile commands
to get these things working, especially when you closely follow the
instructions yet something stil doesn't work.

And the fact that gcc for example lists some eleven thousand lines of
compiler switches, last time I looked, some of which have to be just
right, clearly doesn't impact the portability of the language in any
way!
     So why are you wasting your time here?  Life's too unsigned short.

You're right. As a bit of a language designer myself, I'm never going
to be completely happy with C. Time to go back to my poor little x86
compiler and it's (IIRC) zero compiler switches (but which,
nevertheless, does the job!).
 
D

Dag-Erling Smørgrav

Eric Sosman said:
So why are you wasting your time here? Life's too unsigned short.

That is *so* sig-worthy... almost makes me wish I was into sigs.

DES
 
M

Miles Bader

Stephen Sprunk said:
The back-end part that translates RTL to assembly is quite small, and
that is often all that needs porting for a new architecture. Most "new"
architectures are variations on existing ones, in part due to the
conscious desire to make it easier to port compilers, in which case you
only need to tweak a few things.

A few years ago I ported gcc to a very small (8-bit) processor at work
(the port was never released though :( ).

For the most part, the backend parts were straightforward enough; the
main problem was that the _non-backend_ parts of the compiler made
various assumptions that my processor didn't satisfy -- in particular,
reload needs a certain amount of "space" to work, and a very small
processor with very constrained register usage may not have it. Getting
around reload issues was really annoying, with lots of hacks to both the
backend and to reload itself (a horrible, practically impenetrable,
piece of code).

Things may be better these days, as gcc internals seem to slowly be
getting cleaned up...

-Miles
 
B

Bart van Ingen Schenau

Bart said:
Or perhaps C is used for (firmware for) a processor inside a phone
say, but that phone will be produced in the millions. Surely it must
help (in programmer effort, performance, code sise, any sort of
measure except actual portability) to have a tailored C version for
that system.

It will probably surprise you how portable the firmware for a mobile
phone has to be.
I have worked in that area, and there are several factors that make it a
definite advantage to write portable code.
1. Software development usually begins way before there is any hardware
available. Initially, the software is tested in a simulation environment
on a PC. This implies that the software must build for at least two,
dissimilar, platforms. Only the (parts of) device drivers that handle
the actual communication with the hardware are not tested in this way,
but they make up less than 1% of the total software.
2. The majority of the software in a mobile phone has to be reused for 5
or 6 generations of phone models, with possibly different hardware and
certainly slightly changed requirements. To cope with that,
maintainability takes a front seat. And with maintainability, often
portability comes along.

Bart v Ingen Schenau
 
J

John Nagle

Francis said:
and assuming a 16-bit int, the addition will overflow.

However this raises the whole issue of narrowing conversions. They are
allowed in C but what should happen if the converted value looses
information?

That's one of those places where the distinction between a coercion
and a conversion has teeth.
 
J

jacob navia

John Nagle a écrit :
If overflow were to be taken seriously, I'd take the position
that assignments can overflow and that such overflows are errors.

Truncation in assignment usually indicates a problem with the
program. If you really want truncation, one may have to write
something like

unsigned char c;
unsigned long d;

c = d & 0xff;

Compilers can and should recognize such idioms and optimize them out.
For signed values, "%" should be used. (The semantics of "%" when
the divisor is positive are well-defined.)

After a few years of using Python more than C and C++, I have to
say that C/C++ now feel deficient in this area. There are probably
more programmers now using languages where overflow is either checked
or handled automatically than are using ones where it isn't. Java
takes a harder line in this area, and has well-defined integer arithmetic
semantics. C was defined before integer hardware representations settled
down. I have used Burroughs signed magnitude machines, and DEC and UNIVAC
36-bit mainframes. But that era is over. We now have standardized on
binary integer representations. Finally.

As I've said for years, this is a fixable problem, with known good
solutions,
but I don't expect it to be fixed in C/C++.

John Nagle

Excuse me but I do not see why it can't be fixed in C. My proposal was
precisely in that direction. I have implemented it, and it works.

I have updated my proposal in comp.std.c. Maybe you could take a look?

Thanks
 
J

John Nagle

Keith said:
Assignment itself cannot *directly* cause an overflow; it just
copies a value into an object. A conversion that's implicit in
an assignment can cause an "overflow", though the standard doesn't
use that term; see C99 6.3.1.3p3:

Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined
or an implementation-defined signal is raised.

For consistency, all forms of conversions should be treated alike.
That includes implicit conversions resulting from a cast, as well as
implicit conversions resulting from assignment, argument passing,
return statements, parameter passing, the "usual arithmetic
conversions", and whatever other cases I've forgotten.

I've argued that conversions should be treated like arithmetic
operators; there's obviously some disagreement on that point.

If overflow were to be taken seriously, I'd take the position
that assignments can overflow and that such overflows are errors.

Truncation in assignment usually indicates a problem with the
program. If you really want truncation, one may have to write
something like

unsigned char c;
unsigned long d;

c = d & 0xff;

Compilers can and should recognize such idioms and optimize them out.
For signed values, "%" should be used. (The semantics of "%" when
the divisor is positive are well-defined.)

After a few years of using Python more than C and C++, I have to
say that C/C++ now feel deficient in this area. There are probably
more programmers now using languages where overflow is either checked
or handled automatically than are using ones where it isn't. Java
takes a harder line in this area, and has well-defined integer arithmetic
semantics. C was defined before integer hardware representations settled
down. I have used Burroughs signed magnitude machines, and DEC and UNIVAC
36-bit mainframes. But that era is over. We now have standardized on
binary integer representations. Finally.

As I've said for years, this is a fixable problem, with known good solutions,
but I don't expect it to be fixed in C/C++.

John Nagle
 
K

Keith Thompson

jacob navia said:
John Nagle a écrit : [...]
If overflow were to be taken seriously, I'd take the position
that assignments can overflow and that such overflows are errors.

Truncation in assignment usually indicates a problem with the
program. If you really want truncation, one may have to write
something like

unsigned char c;
unsigned long d;

c = d & 0xff; [...]
As I've said for years, this is a fixable problem, with known good
solutions,
but I don't expect it to be fixed in C/C++.

Excuse me but I do not see why it can't be fixed in C. My proposal was
precisely in that direction. I have implemented it, and it works.
[...]

For the particular example shown, assuming the "& 0xff" is dropped,
the behavior is already well defined. As far as the standard is
concerned, there is no overflow for unsigned types. An
implementation, or a new standard, which treated
c = d;
as an error would break existing code.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,742
Latest member
AshliMayer

Latest Threads

Top