signed integer overflow

R

REH

I asked this on c.l.c++, but they suggested you folks may be better able to
answer.

Basically, I am trying to write code to detect overflows in signed integer
math. I am trying to make it as efficient as possible without resorting to
assembly language, and without causing undefined behavior. That, of course,
means catching the overflow before it happens.

What I asked was (stripping any relevance to C++):

If the range of an integer type is not symmetrical around zero
(i.e., 2's comp.), is it safe to assume that the extra value(s) is one
the negative side?

The reason is I am currently thinking it may be easiest to do the math as
unsigned, check for overflow, and then fixup the sign. I would handle the
fact that the range may not be symmetrical around zero as a corner case.

What I learned from the folks on the C++ group:

1) C and C++ Standards are equivalent on the treatment of signed/unsigned
values. I had thought (mistakenly, I guess) that they differed slightly.

2) That the Standard (should that always be capitalized?), C++ at least,
allows only three formats: 1's comp., 2's comp., and sign/magnitude. I
didn't realize this and thought that any format was allowed, and I was
worried about my code working correctly on some weird format I've never
heard of. If that is true, then my only "corner case" is with the maximum
(in magnitude) negative value in 2's complement.

I had thought this was a problem that had been beaten to death in both
languages, but I could find nothing on the web. Well, that's not true. The
stuff I did find always assumed that signed overflow was well behaved and
worked as unsigned.

Not relevant here, but I am attempting to write a class template that
behaves like Ada's ranged types (and subtypes). That is, for example, if X
+ Y overflows or strays out of its assigned range, it will throw an
exception.

Thanks,

REH
 
R

Richard Heathfield

REH said:
If the range of an integer type is not symmetrical around zero
(i.e., 2's comp.), is it safe to assume that the extra value(s) is one
the negative side?

Yes. The Standard makes it clear that signed integer types are made up of a
sign bit, at least a certain number of value bits (15 for short int and
int, and 31 for long), and at least zero padding bits. Because the
representation of non-negative values of signed integer types is the same
as for unsigned integer types, it is easy to see that there can be at most
only one "extra" value, and that value must be negative because the sign
bit will have to be set in order to get the extra bit combination. Think of
(heretical!) three-bit ints, with the first of them being the sign bit:

Bit pattern Unsigned value Signed value
000 0 0
001 1 1
010 2 2
011 3 3

All remaining values must have the high bit set, and thus must be negative
in a signed type (irrespective of whether it's ones' complement, two's
complement, or sign-and-mag).
What I learned from the folks on the C++ group:

1) C and C++ Standards are equivalent on the treatment of signed/unsigned
values. I had thought (mistakenly, I guess) that they differed slightly.

No, they're equivalent TTBOMKAB.
2) That the Standard (should that always be capitalized?), C++ at least,
allows only three formats: 1's comp., 2's comp., and sign/magnitude.

It's the same for C.
 
C

CBFalconer

REH said:
.... snip ...

If the range of an integer type is not symmetrical around zero
(i.e., 2's comp.), is it safe to assume that the extra value(s)
is one the negative side?

The reason is I am currently thinking it may be easiest to do the
math as unsigned, check for overflow, and then fixup the sign. I
would handle the fact that the range may not be symmetrical around
zero as a corner case.

No need to guess. For any integer type, the limiting values are
available in <limits.h>.
 
E

Eric Sosman

Richard said:
[...] Think of
(heretical!) three-bit ints, with the first of them being the sign bit:

Bit pattern Unsigned value Signed value
000 0 0
001 1 1
010 2 2
011 3 3

All remaining values must have the high bit set, and thus must be negative
in a signed type (irrespective of whether it's ones' complement, two's
complement, or sign-and-mag).

100 could be zero, which is not negative:

if (minus_zero < 0 || minus_zero > 0 || minus_zero != 0) {
puts("This isn't C!");
puts("(Or else minus zero is a trap representation,\n"
"and you're only seeing this as a consequence\n"
"of undefined behavKUHYTDjn;lkUy97609i]*&^%$");
}

Although the example is flawed, the O.P.'s supposition is
correct: If the set of values is not symmetrical about zero,
the "extra" value must be negative:

- Signed magnitude: The "extra" encoding is 10...0, which
is either "minus zero" or a trap representation. Even if
"minus zero" is allowed, its value is zero so the range
is symmetrical.

- Ones' complement: The "extra" encoding is 11...1, which
is either "minus zero" or a trap. As before, the range is
symmetrical.

- Two's complement: The "extra" encoding is 10...0, which
is either minus two-to-the-Nth or a trap. If it's a trap
the range is symmetrical; otherwise, the range is
asymmetrical and the "extra" value is negative.

That covers all the representations permitted by the Standard,
and the only case in which the range is asymmetrical has more
negative than positive values.
 
R

REH

CBFalconer said:
No need to guess. For any integer type, the limiting values are
available in <limits.h>.
Thanks, CB, but just knowing the limits does not make it trivial to
determine whether a given arithmetic operation will overflow.
 
R

REH

Eric Sosman said:
Although the example is flawed, the O.P.'s supposition is
correct: If the set of values is not symmetrical about zero,
the "extra" value must be negative:

- Signed magnitude: The "extra" encoding is 10...0, which
is either "minus zero" or a trap representation. Even if
"minus zero" is allowed, its value is zero so the range
is symmetrical.

- Ones' complement: The "extra" encoding is 11...1, which
is either "minus zero" or a trap. As before, the range is
symmetrical.

- Two's complement: The "extra" encoding is 10...0, which
is either minus two-to-the-Nth or a trap. If it's a trap
the range is symmetrical; otherwise, the range is
asymmetrical and the "extra" value is negative.

That covers all the representations permitted by the Standard,
and the only case in which the range is asymmetrical has more
negative than positive values.
Thank you Eric. Yours and Richard's posts were very helpful. Is there a
simple test to determine whether the minus zero or trap is used? Or, is
that unnecessary to know as long as the values do not overflow?

REH
 
C

CBFalconer

REH said:
Thanks, CB, but just knowing the limits does not make it trivial to
determine whether a given arithmetic operation will overflow.

Yes it does. A preliminary calculation with one operand and the
limit will yield the maximum value for the other operand, so you
can avoid an overflow with a single comparison.

if (a > 0) {
if (b > 0)
if (a > (MAX_INT - b)) overflow();
}
else if (a < 0) {
if (b < 0)
if (a < (MIN_INT - b)) overflow();
}
etc.
 
R

REH

CBFalconer said:
Yes it does. A preliminary calculation with one operand and the
limit will yield the maximum value for the other operand, so you
can avoid an overflow with a single comparison.

if (a > 0) {
if (b > 0)
if (a > (MAX_INT - b)) overflow();
}
else if (a < 0) {
if (b < 0)
if (a < (MIN_INT - b)) overflow();
}
etc.

You are assuming just simple addition and subtraction. It is more complex
for other operations.

REH
 
W

websnarf

REH said:
You are assuming just simple addition and subtraction. It is more complex
for other operations.

Indeed it is. Testing for multiplication overflow cannot really be
done efficiently in C, except for lower sized integers.
 
R

REH

Indeed it is. Testing for multiplication overflow cannot really be
done efficiently in C, except for lower sized integers.

Well, efficiency is relative, but I am currently only concerned with it
being correct and standard compliant. For unsigned multiplication, I am
currently do something like (ignoring 0 for the example):

a = b * c;
if (c != a / b)
overflow();

I'm still deciding how to do signed multiplication, but I leaning towards
doing it as unsigned and fixing the sign afterwards. I would treat the
condition MIN_INT < -MAX_INT as a special case.

thanks,

REH
 
C

Christian Bau

"REH said:
I asked this on c.l.c++, but they suggested you folks may be better able to
answer.

Basically, I am trying to write code to detect overflows in signed integer
math. I am trying to make it as efficient as possible without resorting to
assembly language, and without causing undefined behavior. That, of course,
means catching the overflow before it happens.

What I asked was (stripping any relevance to C++):

If the range of an integer type is not symmetrical around zero
(i.e., 2's comp.), is it safe to assume that the extra value(s) is one
the negative side?

The reason is I am currently thinking it may be easiest to do the math as
unsigned, check for overflow, and then fixup the sign. I would handle the
fact that the range may not be symmetrical around zero as a corner case.

What I learned from the folks on the C++ group:

1) C and C++ Standards are equivalent on the treatment of signed/unsigned
values. I had thought (mistakenly, I guess) that they differed slightly.

2) That the Standard (should that always be capitalized?), C++ at least,
allows only three formats: 1's comp., 2's comp., and sign/magnitude. I
didn't realize this and thought that any format was allowed, and I was
worried about my code working correctly on some weird format I've never
heard of. If that is true, then my only "corner case" is with the maximum
(in magnitude) negative value in 2's complement.

C99 only allows 1's comp, 2's comp and sign/magnitude.

An unsigned integer type has N value bits and can represent numbers from
0 to 2^n - 1. A signed integer type has M value bits with M <= N and one
sign bit. It can represent positive numbers from 0 to 2^M - 1. The range
of negative values depends on whether the implementation uses 2's comp
(-2^M to -1) or 1's comp or sign/magnitude (-2^M + 1 to -1).

To be hundred percent portable, you must realise that M = N is possible.
An implementation could use 32 bit 2's complement int (31 bit + sign
bit) and 31 bit unsigned int (31 bit + padding bit). That will obviously
give you more problems. (You can't have 16 bit signed and 15 bit
unsigned int, or 32 bit signed and 31 bit unsigned long, because
unsigned int and unsigned long must have at least 16 and 32 bits).
 
F

Flash Gordon

REH wrote:

a = b * c;
if (c != a / b)
overflow();

I'm still deciding how to do signed multiplication, but I leaning towards
doing it as unsigned and fixing the sign afterwards. I would treat the
condition MIN_INT < -MAX_INT as a special case.

Do you actually need the full range, or would it be good enough and
simpler to assume MIN_INT==-MAX_INT and potentially flag negative
overflow early?
 
R

REH

Flash Gordon said:
REH wrote:



Do you actually need the full range, or would it be good enough and
simpler to assume MIN_INT==-MAX_INT and potentially flag negative overflow
early?
--

Hi Flash. Yes, I want to allow the full scale of any numerical type
(though, I am I only concentrating on integral types for now).

REH
 
C

CBFalconer

REH said:
You are assuming just simple addition and subtraction. It is
more complex for other operations.

Yes it is. But the information you need is there in <limits.h>.
You will find div(), ldiv(), and lldiv useful. Of course the
proper way to do it is for the actual object code to trap
overflows, which is ridiculously easy on the x86 at least, and most
grown up computer architectures, including the DS9000. On the x86
it only involves an INTO instruction. The standard only says that
the overflow action is implementation defined.
 
W

Walter Roberson

Of course the
proper way to do it is for the actual object code to trap
overflows, which is ridiculously easy on the x86 at least, and most
grown up computer architectures, including the DS9000.

I thought it was the other way around -- that on the DS9000,
arithmetic overflow triggered a Bengal Programmer Trap.
 
R

REH

CBFalconer said:
Yes it is. But the information you need is there in <limits.h>.
You will find div(), ldiv(), and lldiv useful. Of course the
proper way to do it is for the actual object code to trap
overflows, which is ridiculously easy on the x86 at least, and most
grown up computer architectures, including the DS9000. On the x86
it only involves an INTO instruction. The standard only says that
the overflow action is implementation defined.

Yes, but I am trying to do it without resorting to assembly language because
my code must run on various platforms and processors. So, I am trying to
avoid causing a trap condition or undefined behavior. Of course, now that I
know the possible formats is quite limited, and not open-ended as I had
feared, it's not that bad. Well, not that bad yet. In the future, I want
to expand the code to handle floating points. Which, I believe, could get
messy...

Thanks,

REH
 
W

websnarf

REH said:
Indeed it is. Testing for multiplication overflow cannot really be
done efficiently in C, except for lower sized integers.

Well, efficiency is relative, [...]

Well if you are willing to perform divisions, then indeed it most
certainly is!
[...] but I am currently only concerned with it
being correct and standard compliant. For unsigned multiplication, I am
currently do something like (ignoring 0 for the example):

a = b * c;
if (c != a / b)
overflow();

That's fine except for when b is zero. How about:

if (0 != b && a/b != c) overflow ();

If you want to skip the cost of the division in many cases, then:

#define HALF_WAY (1 << (sizeof (unsigned)+1)/2)
if ((b|c) >= HALF_WAY &&
((b >= HALF_WAY && c >= HALF_WAY) || (0 != b && a/b != c)))
overflow ();

And of course you can go further by simulating the multiply as 4
smaller multiplies and then checking the high multiply then the sum of
the other 3 for overflow.

All this for what can be done in basically 1 to 3 more instructions in
most assembly languages.
I'm still deciding how to do signed multiplication, but I leaning towards
doing it as unsigned and fixing the sign afterwards.

Probably best.
[...] I would treat the condition MIN_INT < -MAX_INT as a special case.

Right, you have no choice.
 
E

Eric Sosman

REH said:
Thank you Eric. Yours and Richard's posts were very helpful. Is there a
simple test to determine whether the minus zero or trap is used? Or, is
that unnecessary to know as long as the values do not overflow?

First question: I can't think of any 100% safe way to
test whether a data object holds a trap representation --
because if it does, simply trying to look at it may cause
the trap. (You could inspect the bytes as an array of
unsigned char and compare them to known trap representations,
but that begs the question.)

Second question: Ordinary arithmetic will never produce
a trap representation from valid operands unless something
like overflow or division by zero occurs.

As a practical matter, you can probably rely on integers
using two's complement with no trap representations. Other
schemes are allowed by the Standard and have been used in
real computers, but those designs have become as rare as the
ivory-billed woodpecker, if not the dodo. If you can write
your code without relying on such an assumption, fine -- but
you probably needn't bend over backwards to cater to what is
nowadays an awfully remote possibility. (Besides, where are
you going to find test systems of all six architectural flavors,
five exceedingly rare if they exist at all?) Go ahead and
assume asymmetrical two's complement, and insert

#include <limits.h>
#if INT_MIN + INT_MAX != -1
/* Not asymmetrical two's complement */
#error "This short-sighted code can't cope!"
#endif

.... so the code will kick up a ruckus instead of delivering
wrong answers if someone ever tries it on an exotic machine.

(I hope you realize the risk I'm taking to offer practical
advice: I'm likely to lose my Pedant's License over this breach
of orthodoxy! "Cardinal Fang, fetch ... the Comfy Chair!")
 
J

Joe Wright

Eric said:
Richard said:
[...] Think of (heretical!) three-bit ints, with the first of them
being the sign bit:

Bit pattern Unsigned value Signed value
000 0 0
001 1 1
010 2 2
011 3 3

All remaining values must have the high bit set, and thus must be
negative in a signed type (irrespective of whether it's ones'
complement, two's complement, or sign-and-mag).


100 could be zero, which is not negative:

if (minus_zero < 0 || minus_zero > 0 || minus_zero != 0) {
puts("This isn't C!");
puts("(Or else minus zero is a trap representation,\n"
"and you're only seeing this as a consequence\n"
"of undefined behavKUHYTDjn;lkUy97609i]*&^%$");
}

Although the example is flawed, the O.P.'s supposition is
correct: If the set of values is not symmetrical about zero,
the "extra" value must be negative:

- Signed magnitude: The "extra" encoding is 10...0, which
is either "minus zero" or a trap representation. Even if
"minus zero" is allowed, its value is zero so the range
is symmetrical.

- Ones' complement: The "extra" encoding is 11...1, which
is either "minus zero" or a trap. As before, the range is
symmetrical.

- Two's complement: The "extra" encoding is 10...0, which
is either minus two-to-the-Nth or a trap. If it's a trap
the range is symmetrical; otherwise, the range is
asymmetrical and the "extra" value is negative.
That would be "minus two-to-the-(N-1)th" surely. Two's complement 100 in
our three-bit model is -4.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,744
Latest member
CortneyMcK

Latest Threads

Top