Does integer over flow cause undefined behaviour ?

D

dragoncoder

Hi all,

Does the following code invoke undefined behaviour ?

[z852378@premier dumpyard]$ cat a1.cc
#include <iostream>
#include <limits>

int main() {
int a = INT_MAX/2;
std::cout << "a is " << a << std::endl;
a = a * a;
std::cout << "a is " << a << std::endl;
return 0;
}
[z852378@premier dumpyard]$ g++ a1.cc
[z852378@premier dumpyard]$ ./a.out
a is 1073741823
a is -2147483647

thanks
 
V

Victor Bazarov

dragoncoder said:
Does the following code invoke undefined behaviour ?

Yes. 5/5.
[z852378@premier dumpyard]$ cat a1.cc
#include <iostream>
#include <limits>

int main() {
int a = INT_MAX/2;
std::cout << "a is " << a << std::endl;
a = a * a;
std::cout << "a is " << a << std::endl;
return 0;
}
[z852378@premier dumpyard]$ g++ a1.cc
[z852378@premier dumpyard]$ ./a.out
a is 1073741823
a is -2147483647

V
 
G

Greg

dragoncoder said:
Hi all,

Does the following code invoke undefined behaviour ?

[z852378@premier dumpyard]$ cat a1.cc
#include <iostream>
#include <limits>

int main() {
int a = INT_MAX/2;
std::cout << "a is " << a << std::endl;
a = a * a;
std::cout << "a is " << a << std::endl;
return 0;
}
[z852378@premier dumpyard]$ g++ a1.cc
[z852378@premier dumpyard]$ ./a.out
a is 1073741823
a is -2147483647

Yes, but nearly every existing implementation of C++ defines integer
overflow as harmless. And in fact the Standard even observes as much:
"Note: most existing implementations of C++ ignore integer overflows."
§5.1/5.

Greg
 
P

Pete Becker

Greg said:
Yes, but nearly every existing implementation of C++ defines integer
overflow as harmless.

Which has nothing to do with whether its behavior is undefined.
Undefined behavior is not necessarily harmful.
And in fact the Standard even observes as much:
"Note: most existing implementations of C++ ignore integer overflows."

Which may or may not be harmful, depending on it means to ignore integer
overflows and on what the program does with the result. But it is still
undefined behavior, because the C++ standard doesn't require any
particular behavior in that case.
 
P

Phlip

Pete said:
Which has nothing to do with whether its behavior is undefined. Undefined
behavior is not necessarily harmful.

I thought the nearest toilet could explode.

Closer to the metal, I thought that certain bit patterns couldn't form valid
integers, and could cause a trap. Hence, I thought that overflowing an
integer could hit one of those bit patterns.

Is overflow implementations-defined as safe-to-read-but-garbage-bits? Or
might the nearest toilet.. you know..?
 
A

Alf P. Steinbach

* Phlip:
I thought the nearest toilet could explode.

Closer to the metal, I thought that certain bit patterns couldn't form valid
integers, and could cause a trap. Hence, I thought that overflowing an
integer could hit one of those bit patterns.

Is overflow implementations-defined as safe-to-read-but-garbage-bits? Or
might the nearest toilet.. you know..?

Most C++ implementations employ two's complement representation of
signed integers, with no trapping.

That means arithmetic modulo 2^n, just as with unsigned integers.

Many, including me, have argued that the C++ standard should stop
supporting the ENIAC, MANIAC and NUSSE computers, and require two's
complement no-trapping representation. Which, for example, would make
std::clock much more useful (IIRC), and would make a very large part of
the existing C++ code base conforming. But ironically, since you have
to search very obscure corners of the universe to find a C++ compiler
that doesn't do that already, i.e. because it's already a de-facto
standard, because the existing practice is so overwhelmingly uniform,
there's not sufficient pressure to have it standardized...
 
G

Greg

Pete said:
Which has nothing to do with whether its behavior is undefined.
Undefined behavior is not necessarily harmful.

Undefined behavior is the absence of a defined behavior (in the case,
by the C++ Standard) - but it is not an affirmative behavior in its own
right. And it is quite common for implementations or architectures to
define behavior not defined by the Standard. Dereferencing a NULL
pointer would be an obvious example of a behavior undefined by the
Standard - but which nonetheless has a behavior defined in almost every
implementation. Integer overlfow would be another example of a
widely-defined operation.
Which may or may not be harmful, depending on it means to ignore integer
overflows and on what the program does with the result. But it is still
undefined behavior, because the C++ standard doesn't require any
particular behavior in that case.

An implementation is free to define its own behavior for behavior not
defined by the Standard (and nearly all do in this case). So there is
never a certainty that a behavior left undefined by the Standard will
also be undefined by the implementation or the architecture in a given
case.

Greg
 
M

Mirek Fidler

Many, including me, have argued that the C++ standard should stop
supporting the ENIAC, MANIAC and NUSSE computers, and require two's
complement no-trapping representation. Which, for example, would make
std::clock much more useful (IIRC), and would make a very large part of
the existing C++ code base conforming. But ironically, since you have
to search very obscure corners of the universe to find a C++ compiler
that doesn't do that already, i.e. because it's already a de-facto
standard, because the existing practice is so overwhelmingly uniform,
there's not sufficient pressure to have it standardized...

Sometimes this makes me think that we would need some sort of
"substandard" that "defines" some of those undefined behaviours - as
most code in fact needs to use some of relatively harmless "undefineds"
to exist (e.g. memory allocator routines or GC) or it can use it to
improve performance performance problems (memcpy non-PODs) - and it
would still be quite advantage for portability to know that certain
platform supports such "substandard".

Mirek
 
B

Bo Persson

Alf P. Steinbach said:
* Phlip:

Most C++ implementations employ two's complement representation of
signed integers, with no trapping.

That means arithmetic modulo 2^n, just as with unsigned integers.

Many, including me, have argued that the C++ standard should stop
supporting the ENIAC, MANIAC and NUSSE computers,

Or Univac / Unisys hardware, still in production,
and require two's complement no-trapping representation.

which still offers one's complement 36 bit words, for backward
compatibility.
Which, for example, would make std::clock much more useful (IIRC),
and would make a very large part of the existing C++ code base
conforming. But ironically, since you have to search very obscure
corners of the universe to find a C++ compiler that doesn't do that
already, i.e. because it's already a de-facto standard, because the
existing practice is so overwhelmingly uniform, there's not
sufficient pressure to have it standardized...

Don't you think it would have been hard for the C++ committee to ban
this kind of hardware, when these chapters were drafted around 1990?

I am right now working in a project, modifying communications with a
financial institution, which would allow them to shut down their
Unisys machines in september *this year*. If it doesn't work out as
well as planned, they might continue to run next year as well.


How good is a standard that makes C++ impossible to implement on that
kind of hardware? Just for this tiny detail.


Bo Persson
 
P

Pete Becker

Phlip said:
I thought the nearest toilet could explode.

I don't see how that could happen.

When the standard says that the behavior of a program that uses some
code construct is undefined it means that the standard doesn't tell you
what happens when you do it. That's all. It doesn't require that bad
things happen, and it certainly doesn't say what something that's
"harmless" actually does.
Closer to the metal, I thought that certain bit patterns couldn't form valid
integers, and could cause a trap. Hence, I thought that overflowing an
integer could hit one of those bit patterns.

Could be. The behavior of a code construct whose behavior is undefined
is undefined. said:
Is overflow implementations-defined as safe-to-read-but-garbage-bits?

It's not implementation-defined behavior. It's undefined behavior. The
difference is that in the former case, the compiler's documentation has
to tell you what it does.
 
P

Pete Becker

Mirek said:
Sometimes this makes me think that we would need some sort of
"substandard" that "defines" some of those undefined behaviours - as
most code in fact needs to use some of relatively harmless "undefineds"
to exist (e.g. memory allocator routines or GC) or it can use it to
improve performance performance problems (memcpy non-PODs) - and it
would still be quite advantage for portability to know that certain
platform supports such "substandard".

Sure, you can kill performance and make compiler writers work harder in
order to improve "portability" for a small fraction of the code that
people write. Java did it with their floating-point math, and had to
undo it.
 
P

Pete Becker

Greg said:
Dereferencing a NULL
pointer would be an obvious example of a behavior undefined by the
Standard - but which nonetheless has a behavior defined in almost every
implementation.

Really? What can I expect to happen when I dereference a null pointer
with MSVC 7.1? And where is it documented? After I've done this, what is
the state of my program?
Integer overlfow would be another example of a
widely-defined operation.

Really? How is it defined for MSVC 7.1? And where is it documented? Has
Microsoft promised that whatever this behavior is, it will never be changed?

I'm not disputing that you can often figure out what happens in
particular cases. But that's not a sufficient basis for saying that it's
well defined for that compiler. Unless the compiler's specification
tells you exactly what you can expect, you're guessing. That's often
appropriate, but it doesn't make what you're doing well defined.
An implementation is free to define its own behavior for behavior not
defined by the Standard (and nearly all do in this case).

Of course it is.
So there is
never a certainty that a behavior left undefined by the Standard will
also be undefined by the implementation or the architecture in a given
case.

Of course not. Nobody said otherwise.
 
A

Alf P. Steinbach

* Bo Persson:
How good is a standard that makes C++ impossible to implement on that
kind of hardware [Univac / Unisys]?

These olde beasties are finally being shut down. And you think someone
is going to write a C++ program for them? Hah.

Just for this tiny detail.

The basic substrate the language is built in, does matter.
 
P

Pete Becker

Pete said:
Really? What can I expect to happen when I dereference a null pointer
with MSVC 7.1? And where is it documented? After I've done this, what is
the state of my program?

Okay, bad example. <g> SEH is, of course, an abomination when mixed with
C++, but it's arguabely well-defined.

A complete analysis would involve three steps:

1. Is there something reasonable that can be required in place of what's
currently undefined behavior

2. Can the new behavior be implemented without imposing noticeable
overhead on programs that don't use it

3. Is the new behavior useful?

Most discussions advocating making undefined behavior well defined
ignore number 3. That's the problem, for example, with the assertion
that integer overflow is usually harmless. It doesn't tell you what you
can do with it.
 
B

Bo Persson

Alf P. Steinbach said:
* Bo Persson:
How good is a standard that makes C++ impossible to implement on
that kind of hardware [Univac / Unisys]?

These olde beasties are finally being shut down. And you think
someone is going to write a C++ program for them? Hah.

No, but I don't think we should ban implementations on alternate
hardware, by specifying the language in too much detail.

So, if we ban one's complement integers, what about 9 bit bytes, 36 or
48 bit words, non-IEEE floating point, or EBCDIC character sets?

On our mainframes, IBM had to add special Java processors to have it
run reasonably efficient. Should we opt for C++ add-on hardware as
well, or should we allow it to run efficiently on existing machines?
The basic substrate the language is built in, does matter.

Ok, so I find one sentence, "The representations of integral types
shall define values by use of a pure
binary numeration system.", out of a 1000 page document a tiny detail.


Bo Persson
 
A

Alf P. Steinbach

* Bo Persson:
Alf P. Steinbach said:
* Bo Persson:
How good is a standard that makes C++ impossible to implement on
that kind of hardware [Univac / Unisys]?
These olde beasties are finally being shut down. And you think
someone is going to write a C++ program for them? Hah.

No, but I don't think we should ban implementations on alternate
hardware, by specifying the language in too much detail.

So, if we ban one's complement integers, what about 9 bit bytes, 36 or
48 bit words, non-IEEE floating point, or EBCDIC character sets?

Good question, because it illuminates the basic issue.

Namely, that

* It's not the case that support for deterministic, well-defined
behavior is in conflict with support for implementation-
defined behavior.

That conflict is one that's made up, out of thin air, by using words
such as "ban", or if not actually using them, thinking them.

One simply should not try to shoehorne conflicting behaviors into the
same, uh, gorble (I made up that word, the 'int' type is an example).

And the C solution for having guaranteed 8-bit bytes and so on, happily
coexisting with implementation defined sizes, is precisely to use
different gorbles for different behaviors: good old 'int' for
On our mainframes, IBM had to add special Java processors to have it
run reasonably efficient. Should we opt for C++ add-on hardware as
well, or should we allow it to run efficiently on existing machines?

See above: it's a false dichotomy.

Ok, so I find one sentence, "The representations of integral types
shall define values by use of a pure
binary numeration system.", out of a 1000 page document a tiny detail.

Ah, but you have essentially pointed out that a great many problems with
current C++, such as character set handling and guaranteed value ranges,
stem from C++ having adopted the same general solution everywhere:
shoehorning all those possible different behaviors into one gorble, and
then by necessity defining that gorble vaguely enough that in some
simple situations it works, and otherwise, just be non-portable...
 
A

Andrew Koenig

Most C++ implementations employ two's complement representation of signed
integers, with no trapping.
That means arithmetic modulo 2^n, just as with unsigned integers.
Many, including me, have argued that the C++ standard should stop
supporting the ENIAC, MANIAC and NUSSE computers, and require two's
complement no-trapping representation.

This is not the only issue.

I remember, from many years ago, a major argument that arose within AT&T
between two organizations, one of which was supplying a compiler and the
other of which was using it. The dispute arose around a statement of the
form

if ((x = y - z) < 0) { ... }

The compiler generated machine code that did the following:

Copy y into a register
Subtract z from the register
Copy the register into memory
If the condition code did not indicate that the last arithmetic
computation
yielded a negative result, jump over the code in braces.

Looks straightforward enough, right? But if y-z overflowed, the condition
code indicated that the result of the last operation was an overflow, which
is different from a negative result. Therefore the code in braces was not
executed, even if the result after the overflow happened to be negative.

In other words, if an overflow occurred, it was possible that x might be
negative, but the code in braces was still not executed.

The programmers tried to rewrite the code this way:

x = y - z;
if (x < 0) { ... }

and found that the compiler generated exactly the same code: The optimizer
realized that the if statement could be reached only from the assignment, so
it assumed that the hardware was correctly reflecting the results of the
last computation.

The developers claimed that this state of affairs reflected a compiler bug.
If you test whether a variable is negative, and the test fails, it is a bug
if the variable is subsequently found to be negative.

The compiler people claimed that the underflow resulted in undefined
behavior, so all bets were off. Moreover, if they were to fix this "bug",
it would result in generating a needless extra instruction in a wide variety
of contexts, merely to defend against programming techniques that no one
should be using anyway.

Ultimately, the compiler people won; and their philosophy has persisted to
this day.
 
P

Phlip

Pete said:
I don't see how that could happen.

Be imaginative. An x10 "firecracker" controller and some plastic explosive?
Suppose we wrote the program to _prevent_ the controller from triggering,
such that certain undefined behaviors cause it to trigger. Don't answer;
this is not the point.
When the standard says that the behavior of a program that uses some code
construct is undefined it means that the standard doesn't tell you what
happens when you do it. That's all. It doesn't require that bad things
happen, and it certainly doesn't say what something that's "harmless"
actually does.

If we had a C++ compiler that had "ISO Compliant" stamped on its box, and if
we write a program and run it, and if the nearest toilet explodes, we might
trace this to either of two situations. Either we wrote undefined behavior
and the compiler took the option of derailing, or we wrote defined behavior
that exposed a bug in the compiler. So then the family of whoever was on the
toilet at the time will have these options:

- sue us, because the code was undefined, so the compiler
implementors are not to blame

- sue the compiler implementors, because the code was
defined, so we are not to blame

I'm aware this is a slightly different question. But isn't this an example
of the _legal_ implications of the ISO Standard?
It's not implementation-defined behavior. It's undefined behavior. The
difference is that in the former case, the compiler's documentation has to
tell you what it does.

Yay. Now get on Alf's branch of the thread and argue the implementor should
not specify in their manual that 2s complement notation will lead to
such-and-so overflow situations. ;-)
 
A

Alf P. Steinbach

* Andrew Koenig:
This is not the only issue.

I remember, from many years ago, a major argument that arose within AT&T
between two organizations, one of which was supplying a compiler and the
other of which was using it. The dispute arose around a statement of the
form

if ((x = y - z) < 0) { ... }

The compiler generated machine code that did the following:

Copy y into a register
Subtract z from the register
Copy the register into memory
If the condition code did not indicate that the last arithmetic
computation
yielded a negative result, jump over the code in braces.

Looks straightforward enough, right? But if y-z overflowed, the condition
code indicated that the result of the last operation was an overflow, which
is different from a negative result. Therefore the code in braces was not
executed, even if the result after the overflow happened to be negative.

In other words, if an overflow occurred, it was possible that x might be
negative, but the code in braces was still not executed.

The programmers tried to rewrite the code this way:

x = y - z;
if (x < 0) { ... }

and found that the compiler generated exactly the same code: The optimizer
realized that the if statement could be reached only from the assignment, so
it assumed that the hardware was correctly reflecting the results of the
last computation.

The developers claimed that this state of affairs reflected a compiler bug.
If you test whether a variable is negative, and the test fails, it is a bug
if the variable is subsequently found to be negative.

The compiler people claimed that the underflow resulted in undefined
behavior, so all bets were off. Moreover, if they were to fix this "bug",
it would result in generating a needless extra instruction in a wide variety
of contexts, merely to defend against programming techniques that no one
should be using anyway.

Ultimately, the compiler people won; and their philosophy has persisted to
this day.

I quoted it in full because it's a nice story...

Assuming the language was C or C++, the problem here was that with
signed types the expression (x = y - z) < 0) had, and has, undefined
behavior for certain values of y and z, so that /no matter/ what the
compiler folks did, they could end up being "wrong" wrt. the intention.

Given a language that left this as undefined behavior, the compiler
folks were IMO right to reject a request to define it, which would have
made the code depend on a particular compiler instead of simple but ugly
rewrite using unsigned types and casts.

But the issue wouldn't have existed, if instead the expression was
well-defined for all values. ;-) Then, one could write natural code
that one could be sure was correct. Not merely fast.

So in spite of the compiler folks being right, the language definition
folks (a third group) were IMO not: with 20-20 hindsight we can now see
that they were optimizing prematurely and thereby fostering evil usage.
 
B

Bo Persson

Alf P. Steinbach said:
* Bo Persson:

Good question, because it illuminates the basic issue.

Namely, that

* It's not the case that support for deterministic, well-defined
behavior is in conflict with support for implementation-
defined behavior.

That conflict is one that's made up, out of thin air, by using words
such as "ban", or if not actually using them, thinking them.

Here I think "ban" is a synonym for "making an implementation totally
inefficient".

Forcing 36 bit one's complement hardware to behave like if it was 32
bit two's complement, is just an extreme example. Especially if the
problem manifests itself only in overflow situations.
Ah, but you have essentially pointed out that a great many problems
with current C++, such as character set handling and guaranteed
value ranges, stem from C++ having adopted the same general solution
everywhere: shoehorning all those possible different behaviors into
one gorble, and then by necessity defining that gorble vaguely
enough that in some simple situations it works, and otherwise, just
be non-portable...

This is another view on 'being flexible'.

I am sure the standards committee didn't make up these rules just for
fun, but were well aware that specifying seemingly tiny details would
make it impossible to implement the language on a wide range of
hardware. By not specifying some variations ('being vague'), you don't
limit yourself to:

8 bit bytes
2^n bytes per datatype
two's complement integers
no pad bits
'a' == 0x20
floating point with x bit exponent and y bit mantissa
non-segmented memory
byte addressed machines
trapping on defererencing a null pointer
non-trappning on underflow
etc.
what happens on divide-by-zero?
what happens when using invalid pointers?
etc.
etc.


For each of these 'improvements' in portability, you get one tiny
detail that makes it hard to implement the language on one particular
piece of hardware. That might make the effects portable, but not the
run time performance if you have to compensate for your hardware
(unless you build a special Java^H^H^H^H C++ co-processor).


Bo Persson
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,954
Messages
2,570,116
Members
46,704
Latest member
BernadineF

Latest Threads

Top