James said:
[...]
Yes, but the '-' is not part of the integer literal. It's
an operator that is applied to it. Now the C++ standard
says:
"The type of an integer literal depends on its form, value,
and suffix. If it is decimal and has no suffix, it has the
first of these types in which its value can be represented:
int, long int; if the value cannot be represented as a
long int, the behavior is undefined."
Where does it say that?
I've quoted the above from C++ 98, 2.13.1/2.
Interesting. The standard contradicts itself in two successive
paragraphs. The latest draft fixes this, however, and there is
no undefined behavior.
I took the occasion last night to look up the text in some of my
older documents (K&R, first edition, and the ARM). The history
surrounding this seems somewhat curious, to put it mildly:
-- In K&R C, an integral literal is either int (if it fits) or
long (if it doesn't fit in an int). This holds for all
integral literals, regardless of base (but one could append
an L or an l to force long). K&R doesn't say what happens
if it doesn't fit in a long, but unsigned long isn't an
option, since unsigned didn't exist yet.
-- The ARM (1988 or 1989, I think---I didn't think to check in
my original copy of TC++PL, ed. 1, which would definitely be
pre-ARM) and C90 give the list of types as int, long,
unsigned long; the type is, again, the first one that fits.
Neither says what happens if the value won't fit in an
unsigned long; at least in C90, if it isn't otherwise
specified (the case here), it is undefined behavior.
C90 and the ARM also introduce the U/u suffix, to force
unsigned (in addition to continuing to support L/l). C90,
at least (and I think the ARM as well) also uses a different
list (int, unsigned int, long, unsigned long) for octal and
hexadecimal literals (which means that 2147483648 and
0x80000000 will behave differently on a machine with 32 bit
ints but 64 bit longs). This distinction depending on the
base is maintained in all later documents.
-- The C++98 and the C++03 standards give the list for decimal
literals as simply int, long, without the unsigned long.
Both standards contradict themselves, saying that it is
undefined behavior if the value doesn't fit where they
specify the list, but stating that the program is ill formed
if it doesn't fit in the next paragraph.
-- C99 adds long long and extended integer types, and also
drops unsigned types from the list for decimal literals,
resulting in int, long, long long as the list. It goes on
to say:
If an integer literal cannot be represented by any
type in its list and an extended integer type can
represent its value, it may have that extended
integer type. If all of the types in the list for
the literal are signed, the extended integer type
shall be signed. If all of the types in the list for
the literal are unsigned, the extended integer type
shall be unsigned. If the list contains both signed
and unsigned types, the extended integer type may be
signed or unsigned.
Since the standard doesn't say what happens if the given
value doesn't fit in any of the available types, it is
undefined behavior.
-- The latest draft of the standard copies the C99 text
verbatim, but adds the sentence "A program is ill-formed if
one of its translation units contains an integer literal
that cannot be represented by any of the allowed types." to
the end of the preceding paragraph. (In C++98 and C++03,
that sentence is in a paragraph of its own.)
The version of g++ that I have readily available here (4.1.)
implements long long, but still uses the type list int, long,
unsigned long (ignoring long long) from C90/ARM C++ (although it
supports the LL suffix). It does give a warning, at least with
the options I usually use, and since it documents that anything
the compiler outputs is a diagnostic in the sense of the
standard, it is conformant (at least if you specify -pedantic to
turn off long long); according to the standard, what happens
after the compiler has issued a diagnostic is undefined
behavior.
What exactly does it mean by "the allowed types"?
Those that are listed as possible types for the value. The list
of allowed types depends on the base and the suffixes (u/U and
l/L).