* James Kanze:
[...]
There is no value representation of a particular value.
"Value representation" in the standard's sense is not a code.
There is a set of bits that determines a value for the object.
That set of bits is required to be the same for corresponding
signed and unsigned types.
Except that they rather obviously can't be: the value
representation of a signed int requires a sign bit; the value
representation of an unsigned int cannot have a sign bit.
Uh, that word again...
Yep. It does seem rather obvious that if the sign bit is part
of the value representation, then the value representation of a
signed int cannot be the same as that of an unsigned int.
Well, "cannot": with both C++ compilers I use regularly (or
should I say semi-regularly) the value representation of a
signed int /is/ the same as that of an unsigned int, using the
standard's definition of "value representation".
Except that the standard contradicts (or extends?) itss own
definition in the very next sentence: "the value representation
determines a value". It's hard to conceive of the word value
not having any relationship to the semantics.
And I doubt that there are more than one or two C++ compilers
other than the esoteric one you mentioned where that isn't the
case; probably zero.
This is the nice thing about "cannot": a single
counter-example suffices to disprove the notion, and here we
have a phletora, an abundance, a host of counter-examples,
where the problem isn't to find a counter-example but rather
to find an example!
You don't seem to have grasped that the issue is one of
language. The English language, the language in which the
standard is written. The standard does add or limit some
definitions, in specific context, and this is one. But even
it's definitions are couched in English. In English, if a set
of bits "holds a value", it is implicit that somehow a specific
semantic is associated with different bit patterns. The word
"representation" also implies some sort of mapping.
Now, you can ignore the fact that words in English have
established meanings, and base your argument on the fact that
the C++ standard defines "value representation" as a "set of
bits that holds a value", considering this definition complete
and authoritive. And it is authoritive in the sense of the
standard. On the other hand, I don't think it can be considered
complete, and the fact that it is not complete (and the
standard doesn't really specify how to complete it) is a defect
in the standard.
Of course, this discussion is not new. The C++ standard, here,
is more or less identical to the C90 standard. And the same
problems which you raise were raised with regards to the C
standard; the standard clearly says something that doesn't make
sense, and isn't complete enough to ensure full understanding.
The result of the discussion in the C committee resulted in a
complete rewording, in order to clarify the issues.
The standard's definition is clear, short and simple,
And incomplete.
so I'll
just repeat it: §3.9/4 "The /value representation/ of an
object is the set of bits in the object representation that
determines a /value/, which is one discrete element of an
implementation-defined set of values." -- the value
representation is a set of bits.
Note the representation determines a value. Thus, the sign bit
in a signed representation is not the same bit (even if it
physically occupies the same place) as the 2^n bit of an
unsigned.
Or whatever. It really doesn't matter much how you or I
interpret it, since the original authors (in the C committee)
have recognized that it is ambiguous, and that it isn't clear,
and have reworded it into something which is clear and
unambiguous.
I don't think so. It's a problem, for one compiler. But
then, at least one something is a problem for just about any
compiler...
It's a problem with the definition of the language. The problem
may only appear with one particular compiler today (although I'm
far from sure---I'm not at all familiar with compilers for
things like DSP's, which also often have architectures which
would appear strange to those only familiar with general purpose
machines), but it places a certain number of constraints on
future implementations as well.
Do we have more than one?
Maybe, maybe not. I'm not familiar with all existing
implementations.
Depends what performance or value range hit you're willing to
take, i.e. what's meant by "practically".
Let's say the ENIAC-like hardware only directly supports
signed integers, and then using silly magnitude and sign
instead of reasonable two's complement.
There's actually nothing silly about it. It's very elegant, in
fact, and in many ways, much closer to high level languages than
to the classical assembler. (You only have one add instruction,
which does either floating point or integer arithmetic according
to the type of data it is working on.)
But that's not the point. The question is: do we take the
direction of Java, imposing a specific representation, or do we
follow the tradition of C, leaving a maximum of liberty to the
implementation, to do whatever is most effective on its
hardware? And of course, the answer isn't cast in stone, nor
always black and white: the C++ (like the C) standard already
requires binary representation, defines unsigned arithmetic to
be modulo 2^n, requires at least 8 bits in a char, and that you
can access all of the bits of any data type as an array of char.
(Which excludes the usual conventions on a PDP-10, which were 5
seven bit bytes per 36 bit word. But the byte length on a
PDP-10 is programmable, and presumably, the implementations of
C or C++ used 4 nine bit bytes per 36 bit int.)
If the ENIAC-like machine
I don't know where you get this ENIAC-like. I have no idea what
the architecture of the ENIAC was (but note that a lot of the
really old machines, like the IBM 1401, used decimal arithmetic,
and thus really can't support C).
The machine in question is the Unisys MCP architecture. It is a
rather distant descendant of the old Burroughs serie A machines.
And in many ways, its architecture is a lot more "modern" than
that of the Intel processors; it certainly maps more closely to
high level languages.
supports n-1 bits unsigned arithmetic operations (where n is
the size in bits of the value representation of signed), then
one might instead choose another cost, namely that of limiting
the supported range of signed, instead of "extending" the
range of unsigned.
Requiring 2's complement would IMHO be a good idea. Requiring
32-bits would, on the other hand, break a lot of
implementations.
I only know of two (except for the older 16 bit implementations,
and they had 32 bit longs), and neither of them uses 2's
complement.
Anyway, I'm not really going to enter into a long discussion as
to whether it would be a good idea or not. There are valid
arguments on both sides, and in the end, it is really a matter
of personal opinion. I do think, however, that the issue should
be raised in more general terms: to what degree should C++
support "exotic" architectures. Recognizing the fact that they
are becoming rarer.
That seems to imply that requiring 2's complement would not
allow efficient implementations on maximum different relevant
platforms, which is not the case.
If the platform doesn't support 2's complement, then requiring
it does cause more than a few problems. And have serious
performance implementations.
[snip]
Anyway, I'll raise the issue with the C++ committee. Whatever
they decide, we'll know at least that it's intentional.
I hope they land on keeping current (literal) rules.
Because I rather like having a guarantee that any signed
value, when treating -0 and +0 as the same value (which is how
it is in C++), can be mapped to a unique unsigned one.
The current (literal) rules were considered ambiguous by the C
committee, when they discussed the issue. And the current rules
do not guarantee a bijection between signed and unsigned types
(which I think is what you want). For the guarantee that you
want, you'd also have to modify the text in [conv.integral],
which says that "If the destination type is signed, the value is
unchanged if it can be represented in the destination type (and
bit-field width); otherwise, the value is
implementation-defined." (This too has been clarified in C99.
Does "the value is implementation-defined" allow for a trap?
C99 says explcitly that "either the result is
implementation-defined or an implementation-defined signal is
raised." Again, according to the C committee, this is not a
real change, but a clarification of the intent of the original C
standard.)
Anyway, I'll raise the entire issue with the C++ committee.
(Probably not before this weekend, because it will take some
time to write up all of the details, summarize our discussion
here, propose a correction based on the C standard, etc.) I
think it would be highly profitable for all people concerned if
you could participate in the discussions: if not as a formal
member of a national body, then at least informally. If you're
interested, let me know by email, and I'll contact the people
necessary to set it up.