Incrementing variables past limits

E

Eric Sosman

Chris said:
[...]
Preparing for every conceivable raising or lowering of
computers' hemlines carries a cost, and failing to be prepared
carries a risk. In the context of a given project you may
well decide that the risk is too small and the cost too large.
That's fine; that's part of what engineering is about. But an
implicit decision that all risks are zero is just as foolhardy
as a decision that all costs are justified.

I didn't say anything about such a decision. I'm looking at risk
assessment -- is it really worth writing code which will be inefficient
and hard to maintain in order to cope with a possible hole which the
standard allows but no one is likely to implement that way? Is the
probability of someone producing a system which breaks a lot of code
higher than that of the next C standard breaking code? (Anyone who used
a variable called 'restrict' or 'inline' will have run foul of that in
C99).

You are asking the right questions, but I doubt there is
a single universally correct answer to any of them. All must
be considered in the context of the project at hand; all are
meaningless without that context. This means that a careful
programmer (you, for example) will give different answers at
different times.
[1] Ternary digits ought to be called tits. If they aren't someone was
slipping when they were named[2]...

They weren't keeping abreast of developments ...

If C adopts ternary, perhaps memmove() should be renamed
to mammove() ...

... and we'll all start worshipping a bust of Denise
Writchie's bust ...
 
A

Andrey Tarasevich

Peter said:
...
Unsigned short has scope for problems...

unsigned short us = -1;
us++; /* UB if USHRT_MAX == INT_MAX */

??? Why? I don't see any potential for UB here, regardless of the
USHRT_MAX value.
The paranoid can safely use...

us += 1u;

I don't see how it is different.
 
K

Keith Thompson

Andrey Tarasevich said:
??? Why? I don't see any potential for UB here, regardless of the
USHRT_MAX value.

us++;
does the equivalent to
us = us + 1;
(and then yields the value of us before it was incremented).

In the expression (us + 1), the value of us (which is of type unsigned
short) is promoted to int (if int can hold all the values of type
unsigned short) or to unsigned int (otherwise).

Normally, if int is bigger than short, the value will be promoted to
int and the addition will not overflow; if int is the same size as
short, the value will be promoted to unsigned int and the addition
will wrap around to 0.

But if USHRT_MAX == INT_MAX, the value of us will be promoted to type
int, resulting in a value of INT_MAX, and INT_MAX + 1 will overflow
and cause undefined behavior.

I doubt that any real-world implementations are affected by this.
Type int would have to be effectively only 1 bit wider than type
short, which could only happen if there are padding bits.
I don't see how it is different.

In the expression us + 1u, us will be promoted to unsigned int even if
USHRT_MAX == INT_MAX, avoiding the undefined behavior.
 
A

Andrey Tarasevich

Keith said:
us++;
does the equivalent to
us = us + 1;
(and then yields the value of us before it was incremented).

In the expression (us + 1), the value of us (which is of type unsigned
short) is promoted to int (if int can hold all the values of type
unsigned short) or to unsigned int (otherwise).

Oh, I see. Thank you for the explanation.

As a side note, to me the whole thing feels like a defect: the committee
took a "shortcut" by defining the increment and compound assignment
operators through their "regular" binary counterparts (instead of
providing independent definitions for them) and ended up with this as an
unintentional side-effect. On the other hand, I could be wrong, since
this specification is consistent with C89/90.
 
C

Christian Bau

Andrey Tarasevich said:
Oh, I see. Thank you for the explanation.

As a side note, to me the whole thing feels like a defect: the committee
took a "shortcut" by defining the increment and compound assignment
operators through their "regular" binary counterparts (instead of
providing independent definitions for them) and ended up with this as an
unintentional side-effect. On the other hand, I could be wrong, since
this specification is consistent with C89/90.

On the other hand, would you have wanted "us++" to behave in any way
different from "us = us + 1"?
 
A

Andrey Tarasevich

Christian said:
On the other hand, would you have wanted "us++" to behave in any way
different from "us = us + 1"?

Well, probably not. But if I'd continue to insist that this is a
"defect" I'd have to come to a logical conclusion that the real root of
the problem is not in the way the increment is defined, but rather in
the very existence of implicit integral promotions and/or in the way
these promotions behave. I don't have any desire to make any attacks on
integral promotions at the moment :), particularly because I don't
remember the rationale that led to them being defined the way they are.
 
C

Chris Torek

Given a situation in which USHRT_MAX may equal INT_MAX:

Now suppose "us" is initially set to USHRT_MAX, and USHRT_MAX
is indeed equal to INT_MAX.

And thus the signed int, whose value is INT_MAX, has 1 added
to it. The effect is undefined, and possibly a runtime trap,
which is quite undesirable.

Oh, I see. Thank you for the explanation.

As a side note, to me the whole thing feels like a defect: the committee
took a "shortcut" by defining the increment and compound assignment
operators through their "regular" binary counterparts (instead of
providing independent definitions for them) and ended up with this as an
unintentional side-effect. On the other hand, I could be wrong, since
this specification is consistent with C89/90.

The real flaw, in my opinion, is that the original ANSI C (C89)
committee decided on this bizarre, inconsistent widening treatment of
types in the first place: when an unsigned type is widened, if the
wider signed type can represent all values of the narrower unsigned
type, the resulting type is signed, otherwise the resulting type
is unsigned.

In other words, on a 16-bit implementation, "unsigned short" widens
to "unsigned int", because USHRT_MAX (65535) exceeds INT_MAX (32767)
but not UINT_MAX (65535). On a 32-bit implementation on otherwise
similar (or even identical) hardware, "unsigned short" widens to
*signed* int, because USHRT_MAX (65535) is much less than INT_MAX
(2147483647). Sometimes this makes the code behave differently
on the two compilers, even if they use the same hardware:

unsigned short us = 0;
...
if ((us - 1) > 1)
...

Here, if unsigned short is 16 bits and plain int is also 16 bits
(Compiler A), "us - 1" is (unsigned int)65535, which is greater
than 1. But if unsigned short is 16 bits and plain int is 32 bits
(Compiler B, on the same hardware), "us - 1" is (signed int)-1,
which is less than 1.

The C89 Rationale called such situations "questionably signed";
the theory is that it is hard to tell what the programmer intended
in the first place. So they came up with these so-called "value
preserving" rules. The problem is that, in the presence of any
kind of arithmetic, the value they preserve depends on the relative
values of USHRT_MAX vs INT_MAX (or UINT_MAX vs LONG_MAX, and so
on).

The alternative to "value-preserving" is the so-called "sign
preserving" or "unsigned preserving" rule. This is what Unix-based
systems actually did, and it is CLEARLY (note opinion :) ) the
better method BY FAR, because it does not require comparing USHRT_MAX
and INT_MAX at all. Instead, a narrow unsigned type *always* widens
to the wider unsigned type.

Note that this completely solves the issue at hand, because then
the fact that "++us" accomplishes the same thing as "us = us + 1"
is not a problem: "us" expands to unsigned int, which has the usual
clock-arithmetic semantics and either goes from 65535 to 65536 or
goes from 65535 to 0 (as appropriate), and then that value is put
back into "us", which always produces 0.

(As far as I can tell, there is only one drawback to "unsigned
preserving" behavior, and that is what happens if plain char is
unsigned. This problem can be solved by fiat: we already know that
the I/O library is problematic if UCHAR_MAX > INT_MAX, so we can
simply rule that UCHAR_MAX < INT_MAX and, if necessary [and I am
not sure whether it is], that plain char violates the "unsigned
preserving" behavior and widens to signed int.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,159
Messages
2,570,883
Members
47,415
Latest member
SharonCran

Latest Threads

Top