An overflow case that puzzles me

  • Thread starter Bertrand Mollinier Toublet
  • Start date
B

Bertrand Mollinier Toublet

All,

consider the following piece of code:

#include <stdio.h>

int main(int argc, char **argv) {
unsigned int foo = 2026363600u + argc - 1;
printf("off2 = %u\n", (838237499u * foo - 2137600414u) % 11u);
return 0;
}

Depending on the compiler used, the output varies, and I am not sure
whether a given output should be expected, or if the varying result is
actually to be expected: the "trick" here is that both the the
multiplicative and additive coefficients in the computation in the
printf() statement are multiples of 11.

Therefore, theoretically, the value passed to printf should be 0 in all
cases.

Having said that, on an implementation where unsigned int is 32 bit
(e.g. a x86 compatible), if (838237499u * foo - 2137600414u) is first
adjusted for overflow, then the result of the modulo operation is 2, not
zero.


So. Given all this, I have seen some compilers return 0 unconditionally,
some compilers returning 2 unconditionally, and I am worried about the
variance. Any insight?
 
S

Spiros Bousbouras

All,

consider the following piece of code:

#include <stdio.h>

int main(int argc, char **argv) {
unsigned int foo = 2026363600u + argc - 1;
printf("off2 = %u\n", (838237499u * foo - 2137600414u) % 11u);
return 0;

}

Depending on the compiler used, the output varies, and I am not sure
whether a given output should be expected, or if the varying result is
actually to be expected: the "trick" here is that both the the
multiplicative and additive coefficients in the computation in the
printf() statement are multiples of 11.

2137600414 is not a multiple of 11. The mathematically correct answer
for argc == 1 is 7.
SNIP

So. Given all this, I have seen some compilers return 0 unconditionally,
some compilers returning 2 unconditionally, and I am worried about the
variance. Any insight?

Sure. I've written a small GNU bc script:

for (i=16 ; 1 ; i++) {
p = 2^i
a = 2026363600 % p
b = 838237499 % p
c = 2137600414 % p
d = (a*b) % p - c
if ( d < 0 ) d = d + p
result = d % 11
print "Number of bits " , i , " Final result " , result , "\n"
if ( 2026363600*838237499 < p ) break
}

And here's the output:

Number of bits 16 Final result 1
Number of bits 17 Final result 10
Number of bits 18 Final result 10
Number of bits 19 Final result 2
Number of bits 20 Final result 8
Number of bits 21 Final result 9
Number of bits 22 Final result 9
Number of bits 23 Final result 2
Number of bits 24 Final result 2
Number of bits 25 Final result 2
Number of bits 26 Final result 2
Number of bits 27 Final result 2
Number of bits 28 Final result 2
Number of bits 29 Final result 2
Number of bits 30 Final result 2
Number of bits 31 Final result 2
Number of bits 32 Final result 2
Number of bits 33 Final result 6
Number of bits 34 Final result 3
Number of bits 35 Final result 3
Number of bits 36 Final result 3
Number of bits 37 Final result 3
Number of bits 38 Final result 3
Number of bits 39 Final result 3
Number of bits 40 Final result 3
Number of bits 41 Final result 3
Number of bits 42 Final result 3
Number of bits 43 Final result 7
Number of bits 44 Final result 4
Number of bits 45 Final result 4
Number of bits 46 Final result 4
Number of bits 47 Final result 4
Number of bits 48 Final result 0
Number of bits 49 Final result 0
Number of bits 50 Final result 6
Number of bits 51 Final result 6
Number of bits 52 Final result 6
Number of bits 53 Final result 10
Number of bits 54 Final result 10
Number of bits 55 Final result 10
Number of bits 56 Final result 9
Number of bits 57 Final result 7
Number of bits 58 Final result 3
Number of bits 59 Final result 6
Number of bits 60 Final result 6
Number of bits 61 Final result 7
 
B

Barry Schwarz

All,

consider the following piece of code:

#include <stdio.h>

int main(int argc, char **argv) {
unsigned int foo = 2026363600u + argc - 1;
printf("off2 = %u\n", (838237499u * foo - 2137600414u) % 11u);

What makes you think the second argument is of type unsigned int? If
it's not, you invoke undefined behavior.
return 0;
}

Depending on the compiler used, the output varies, and I am not sure
whether a given output should be expected, or if the varying result is
actually to be expected: the "trick" here is that both the the
multiplicative and additive coefficients in the computation in the
printf() statement are multiples of 11.

838237499 is a multiple of 11 but 2137600414 is not.
Therefore, theoretically, the value passed to printf should be 0 in all
cases.

Mathematically that may be true but it would only be true in C if
UNIT_MAX+1 was also a multiple of 11. Since UNIT_MAX+1 is almost
always a power of 2, it is probably not a multiple of 11.

On my 32 bit system, 838237499*foo computed as the product of two
unsigned int is 2981985067 which is not a multiple of 11. The
difference computed using unsigned int is 844384653 which is also not
a multiple of 11.

Simulating a 16 bit system (by using unsigned short):
838237499 evaluates to 32059 (not a multiple)
foo (2026363601) evaluates to 56017 (not a multiple)
838237499*foo evaluates to 31531 (not a multiple)
difference evaluates to 18829 (not a multiple)
Having said that, on an implementation where unsigned int is 32 bit
(e.g. a x86 compatible), if (838237499u * foo - 2137600414u) is first
adjusted for overflow, then the result of the modulo operation is 2, not
zero.

The product is adjusted for overflow first.
 
S

Spiros Bousbouras

What makes you think the second argument is of type unsigned int? If
it's not, you invoke undefined behavior.

What else could it be?
838237499 is a multiple of 11 but 2137600414 is not.


Mathematically that may be true but it would only be true in C if
UNIT_MAX+1 was also a multiple of 11. Since UNIT_MAX+1 is almost
always a power of 2, it is probably not a multiple of 11.

No, mathematically it's not true and UINT_MAX+1 is always
a power of 2.
 
B

Barry Schwarz

What else could it be?

On a system with 16 bit int, it is probably unsigned long.
No, mathematically it's not true and UINT_MAX+1 is always
a power of 2.

I can find no requirement that UINT_MAX+1 be a power of 2.

Mathematically: If one term of a product is a multiple of 11, the
product will be a multiple of 11. If each term in an addition or
subtraction is a multiple of 11, the sum or difference will also be a
multiple of 11. The remainder after dividing such a value by 11 will
always be 0.
 
S

Spiros Bousbouras

I can find no requirement that UINT_MAX+1 be a power of 2.

From paragraph 1 of 6.2.6.2:

For unsigned integer types other than
unsigned char, the bits of the object
representation shall be divided into
two groups: value bits and padding bits
(there need not be any of the latter). If
there are N value bits, each bit shall
represent a different power of 2 between 1
and 2^(N-1), so that objects of that type
shall be capable of representing values from
0 to 2^N - 1 using a pure binary representation;
this shall be known as the value representation.
Mathematically: If one term of a product is a multiple of 11, the
product will be a multiple of 11. If each term in an addition or
subtraction is a multiple of 11, the sum or difference will also be a
multiple of 11. The remainder after dividing such a value by 11 will
always be 0.

And as you pointed out yourself 2137600414 is not a multiple of 11.
 
P

Phil Carmody

Barry Schwarz said:
I can find no requirement that UINT_MAX+1 be a power of 2.

With N value bits, objects of that unsigned integer types
shall be capable of representing values from 0 to 2^N - 1
using a pure binary representation (and not capable of
representing 2^N). That implies to me that the maximum value
for an unsigned int will be 2^N-1 for some N, and therefore
that UINT_MAX will also be 1 less than a power of 2.

Phil
 
B

Barry Schwarz

From paragraph 1 of 6.2.6.2:

For unsigned integer types other than
unsigned char, the bits of the object
representation shall be divided into
two groups: value bits and padding bits
(there need not be any of the latter). If
there are N value bits, each bit shall
represent a different power of 2 between 1
and 2^(N-1), so that objects of that type
shall be capable of representing values from
0 to 2^N - 1 using a pure binary representation;
this shall be known as the value representation.

Thank you. I searched on max and obviously I should have searched on
unsigned (or maybe 2^).
And as you pointed out yourself 2137600414 is not a multiple of 11.

My original comment to which you objected addressed the OP's assertion
that multiplying and subtracting multiples of 11 should result is a
multiple of 11. While the assertion is mathematically correct, it
doesn't apply to the way C handles unsigned overflow.
 
B

Bertrand Mollinier Toublet

Barry said:
What makes you think the second argument is of type unsigned int? If
it's not, you invoke undefined behavior.


838237499 is a multiple of 11 but 2137600414 is not.
Barry, thanks for taking the time to consider this. You, and many
others, are correct :-/ Sorry about that. The point remains, though,
that the printed result varies depending whether you account for
overflow (modulo some power of 2) before applying the modulo 11
operation or not.
The product is adjusted for overflow first.
Right. This is the crux of the problem. Can you help me identify what
bits of the standard make it as you claim?



Others have pointed out that, depending on the actual size of unsigned
integers, theoretical values will differ. This is definitely correct,
but I should have mentioned that I am working in a context where I can
safely assume that unsigned integers are represented on 32 bits.
Moreover, the variance I observed was for compilers that all had the
same value of UINT_MAX at 2^32 - 1. Therefore, for all these compilers,
I was expecting to obtain the same result (2) consistently.
 
J

jameskuyper

Bertrand said:
Right. This is the crux of the problem. Can you help me identify what
bits of the standard make it as you claim?

6.2.5p9: "A computation involving unsigned operands can never
overflow, because a result that cannot be represented by the resulting
unsigned integer type is reduced modulo the number that is one greater
than the largest value that can be represented by the resulting type."

Now, that doesn't directly address the timing question you ask.
However, you have to understand that in the expression (838237499u *
foo - 2137600414u) % 11u, there are three different computations, and
6.2.5p9 applies separately to each one. It applies first to the result
of the multiplication computation. Then it applies to the result of
the subtraction. Finally, in principle it also applies to the result
from the modulus operation, though in that case it's not possible to
get "a result that cannot be represented".
Others have pointed out that, depending on the actual size of unsigned
integers, theoretical values will differ. This is definitely correct,
but I should have mentioned that I am working in a context where I can
safely assume that unsigned integers are represented on 32 bits.
Moreover, the variance I observed was for compilers that all had the
same value of UINT_MAX at 2^32 - 1. Therefore, for all these compilers,
I was expecting to obtain the same result (2) consistently.

I can't see any way for the results to vary if UINT_MAX is the same
for all the compilers you tried it on. On the other hand, I'm far from
perfect, and may have missed something. I'd recommend adding the
following line to your program:

printf("%u\t%d\t%u\t%u\n", UINT_MAX, argc, foo, 838237499u * foo,
838237499u * foo - 2137600414u);

That will help verify that UINT_MAX and argc are the same, and help
pinpoint where the intermediate results in the calculation start
differing.
 
S

Spiros Bousbouras

Right. This is the crux of the problem. Can you help me identify what
bits of the standard make it as you claim?

It's a chain of moderate length.

6.5.5p3: "The usual arithmetic conversions are performed on the
operands."

6.3.1.8 (paraphrase): The usual arithmetic conversions leave
`unsigned int' as `unsigned int'.

6.5.5p4: "The result of the binary * operator is the product of
the operands."

6.2.5p9: "[...] A computation involving unsigned operands can never
overflow, because a result that cannot be represented by the resulting
unsigned integer type is reduced modulo the number that is one greater
than the largest value that can be represented by the resulting type."

6.5.6 (paraphrase): + has lower precedence than *.

6.5.1 (paraphrase): () has higher precedence than %.

Don't you also need some reference on the "reading" of
integer constants? So for example if UINT_MAX == 65535
and you do unsigned int i = 65536 , will i get the value 0?
What will be the output of printf("%u\n",65536u) ?
 
J

jameskuyper

Spiros said:
Bertrand said:
Barry Schwarz wrote:
The product is adjusted for overflow first.
Right. This is the crux of the problem. Can you help me identify what
bits of the standard make it as you claim?

It's a chain of moderate length.

6.5.5p3: "The usual arithmetic conversions are performed on the
operands."

6.3.1.8 (paraphrase): The usual arithmetic conversions leave
`unsigned int' as `unsigned int'.

6.5.5p4: "The result of the binary * operator is the product of
the operands."

6.2.5p9: "[...] A computation involving unsigned operands can never
overflow, because a result that cannot be represented by the resulting
unsigned integer type is reduced modulo the number that is one greater
than the largest value that can be represented by the resulting type."

6.5.6 (paraphrase): + has lower precedence than *.

6.5.1 (paraphrase): () has higher precedence than %.

Don't you also need some reference on the "reading" of
integer constants? ...

I presume you mean 6.4.4.1p5? Assuming UINT_MAX = 4294967295, as
specifed by the OP, then what it says means that all of the integer
constants in the given code have the type "unsigned int". Perhaps that
needed to be stated, but I would have taken it as something already
assumed.
... So for example if UINT_MAX == 65535
and you do unsigned int i = 65536 , will i get the value 0?

Yes, but that's not because of 6.4.4.1p5. On such a system, 6.4.4.1p5
would indicate that 65536 will have the type 'long int'. The rules for
initialization (6.7.8p11) cross-reference the rules for assignment
(6.5.16.1p2) which call for conversion of that long value to unsigned
int, and that's what actually results in the value being 0.
What will be the output of printf("%u\n",65536u) ?

On such a system, 6.4.4.1p5 indicates that 65536u will have the type
unsigned long int. Since that's incompatible with the "%u" format
specifier, the behavior is undefined.
 
C

CBFalconer

Bertrand said:
Barry Schwarz wrote:
.... snip ...


Right. This is the crux of the problem. Can you help me identify
what bits of the standard make it as you claim?

3.18
[#1] undefined behavior
behavior, upon use of a nonportable or erroneous program
construct, of erroneous data, or of indeterminately valued
objects, for which this International Standard imposes no
requirements

[#2] NOTE Possible undefined behavior ranges from ignoring
the situation completely with unpredictable results, to
behaving during translation or program execution in a
documented manner characteristic of the environment (with or
without the issuance of a diagnostic message), to
terminating a translation or execution (with the issuance of
a diagnostic message).

[#3] EXAMPLE An example of undefined behavior is the
behavior on integer overflow.

So you have no reason to expect anything consistent.
 
K

Kaz Kylheku

Bertrand said:
Barry Schwarz wrote:
... snip ...


Right. This is the crux of the problem. Can you help me identify
what bits of the standard make it as you claim?

3.18
[#1] undefined behavior

A characteristically useless answer.

Since these are unsigned types (and consequently there is in fact no overflow,
except in the abstract sense) you are looking for paragraph 9 of 6.2.5 Types.
 
B

Bertrand Mollinier Toublet

CBFalconer said:
Bertrand said:
Barry Schwarz wrote:
... snip ...
Right. This is the crux of the problem. Can you help me identify
what bits of the standard make it as you claim?

3.18
[#1] undefined behavior
behavior, upon use of a nonportable or erroneous program
construct, of erroneous data, or of indeterminately valued
objects, for which this International Standard imposes no
requirements

[#2] NOTE Possible undefined behavior ranges from ignoring
the situation completely with unpredictable results, to
behaving during translation or program execution in a
documented manner characteristic of the environment (with or
without the issuance of a diagnostic message), to
terminating a translation or execution (with the issuance of
a diagnostic message).

[#3] EXAMPLE An example of undefined behavior is the
behavior on integer overflow.

So you have no reason to expect anything consistent.

Wait, I'm lost. I don't think Barry claimed undefined behavior. Instead
he claimed that "The product is adjusted for overflow first", which was
actually explained elsewhere in the thread with 6.2.5p9 (by James Kuyper).

Are *you* saying now that you think there is some undefined behavior in
the code I posted? (For reference here:

#include <stdio.h>

int main(int argc, char **argv) {
unsigned int foo = 2026363600u + argc - 1;
printf("off2 = %u\n", (838237499u * foo - 2137600414u) % 11u);
return 0;
}

end of reference)



FWIW, it turns out that I had indeed incorrectly claimed that the result
I was obtaining with one compiler (0) was the theoretical result to be
expected in the absence of module reduction (that result is 10 instead).
Therefore, the compiler giving consistently 0 as a result was either
illustrating for me that my code indeed was invoking undefined behavior
(and therefore giving unpredictable, unexpected results) or that the
compiler in question had a bug.

As it turns out, there was indeed a bug in the front-end of GCC that
turns out to have been corrected in 4.3.3 (see
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36548)

Compiler errors don't happen so often, I think this one deserves
celebrating (it wasn't out code! Woohoo!)
 
S

Spiros Bousbouras

Spiros said:
Bertrand Mollinier Toublet wrote:
Barry Schwarz wrote:
The product is adjusted for overflow first.
Right. This is the crux of the problem. Can you help me identify what
bits of the standard make it as you claim?
It's a chain of moderate length.
6.5.5p3: "The usual arithmetic conversions are performed on the
operands."
6.3.1.8 (paraphrase): The usual arithmetic conversions leave
`unsigned int' as `unsigned int'.
6.5.5p4: "The result of the binary * operator is the product of
the operands."
6.2.5p9: "[...] A computation involving unsigned operands can never
overflow, because a result that cannot be represented by the resulting
unsigned integer type is reduced modulo the number that is one greater
than the largest value that can be represented by the resulting type."
6.5.6 (paraphrase): + has lower precedence than *.
6.5.1 (paraphrase): () has higher precedence than %.
Don't you also need some reference on the "reading" of
integer constants? ...

I presume you mean 6.4.4.1p5? Assuming UINT_MAX = 4294967295, as
specifed by the OP, then what it says means that all of the integer
constants in the given code have the type "unsigned int". Perhaps that
needed to be stated, but I would have taken it as something already
assumed.

Yes, 6.4.4.1p5 has the information I wanted but I was too
lazy to look it up. If UINT_MAX = 4294967295 indeed everything
is unsigned int but during the course of the thread the
possibility of smaller UINT_MAX was investigated and some
mistaken claims were made.

So I'll try to set things straight. Here's the code again for
reference:

#include <stdio.h>

int main(int argc, char **argv) {
unsigned int foo = 2026363600u + argc - 1;
printf("off2 = %u\n", (838237499u * foo - 2137600414u) % 11u);
return 0;
}

I will assume that argc==1

If UINT_MAX < 2137600414 then the constant 2137600414u will
be promoted to long unsigned which means that the whole
expression (838237499u * foo - 2137600414u) % 11u is long
unsigned so the printf is undefined behaviour. This means that
for the C code to have defined behaviour and for the bc script I
posted earlier to provide reliable results the number of value
bits in unsigned int has to be at least 31 i.e.
UINT_MAX >= 2147483647
 
S

Spiros Bousbouras

On 13 Jan, 02:08, Bertrand Mollinier Toublet
Are *you* saying now that you think there is some undefined behavior in
the code I posted? (For reference here:

#include <stdio.h>

int main(int argc, char **argv) {
unsigned int foo = 2026363600u + argc - 1;
printf("off2 = %u\n", (838237499u * foo - 2137600414u) % 11u);
return 0;

}

end of reference)

If UINT_MAX >= 2137600414 then there's no undefined
behaviour.
FWIW, it turns out that I had indeed incorrectly claimed that the result
I was obtaining with one compiler (0) was the theoretical result to be
expected in the absence of module reduction (that result is 10 instead).

What result is 10? The result without modulo arithmetic
and for argc==1 is 7 i.e.
(2026363600*838237499 - 2137600414)%11 equals 7.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,744
Latest member
CortneyMcK

Latest Threads

Top