Accessing std::vector data through pointer to first element.

jason.cipriani · Mar 23, 2008

* James Kanze:

I think that scheme runs afoul of the standard.

§3.9.1/3 "... the value representation of each corresponding signed/unsigned
type shall be the same".

Where "value representation" does not denote a mapping from bitpatterns to
conceptual values (a code), but is a term defined earlier by §3.9/4, "The /value
representation/ of an object is the set of bits in the object representation
that determines a /value/, which is one discrete element of an
implementation-defined set of values."; i.e., the set of bits that represents
the value shall be the same for each corresponding signed/unsigned type.

This is for corresponding values (i.e. the non-negative ones) only. If
the Unisys machine ensures that that "hidden" 40th bit (the one that
corresponds to the sign bit in a signed value) is always 0, then int
and unsigned int would have the same representation. It could just be
that the Unisys platform does not support unsigned operations on
integers with the sign bit set, and so an "unsigned int" is actually a
signed int but with it's range limited.

The parts of 3.9.1/3 that you snipped off are actually really
important, I think:

"For each of the signed integer types, there exists a corresponding /
unsigned integer type / each of which occupies the same amount of
storage and has the same alignment requirements / ; that is, each
signed integer type has the same object representation as its
corresponding unsigned integer type."

This is not necessarily violated: sizeof(int) == sizeof(unsigned int)
(I'd assume); and if an "unsigned int" on the Unisys is simply a
signed int with the constraint that it's sign bit must be off (and
thus no additional positive range over a signed int), then the object
representations are the same, as well.

Additionally: "The range of nonnegative values of a signed integer
type is a subrange of the corresponding unsigned integer type, and the
value representation of each corresponding signed/unsigned type shall
be the same."

And this also is not violated, thus allowing for the possibility that
an unsigned int is simply a constrained signed int on that platform.

Jason

James Kanze · Mar 23, 2008

* James Kanze:

[...]

I think that scheme runs afoul of the standard.

§3.9.1/3 "... the value representation of each corresponding
signed/unsigned type shall be the same".

That's an interesting quote in itself. Taken too literally, it
means that either signed int cannot have a sign bit, or unsigned
int must. I'm pretty sure that that's not the intention, of
course.

Where "value representation" does not denote a mapping from
bitpatterns to conceptual values (a code), but is a term
defined earlier by §3.9/4, "The /value representation/ of an
object is the set of bits in the object representation that
determines a /value/, which is one discrete element of an
implementation-defined set of values."; i.e., the set of bits
that represents the value shall be the same for each
corresponding signed/unsigned type.

Of course, rather obviously, that can only apply to values that
they have in common. There's no way that the "value
representation" of -1 in an unsigned int can be the same as in
an int, since the value -1 simply cannot exist in an unsigned
int.

FWIW: I don't think that this is the first time the question has
been raised. And I don't think that the standard is really
clear one way or another. But from what I remember, there was a
definite intention somewhere along the line to make C (and C++)
implementable on just about any conceivable architecture. And
this architecture happens not to support unsigned arithmetic on
integral types. So they fake it (doubtlessly and'ing out the
sign bit for unsigned int). I believe that the original intent
in C was to allow this (but it's been a long time since the
issue was raised, and my memory isn't always that perfect).

And of course, regardless of what the standard says, a real
implementation exists, and is still in production and being sold
today.

In practice, of course, if there were no other reasonable
solution, I probably wouldn't worry about it too much---the
platform certainly isn't very wide spread. But when there's no
real reason to assume that all zero bits means anything more
than all zero bits (or a 0 in an unsigned char), or that
UINT_MAX is greater than INT_MAX, I don't see why one should
make that assumption. (And there's an amazing lot of code you
can write without those assumptions.)

Also, realistically, despite all my precautions, I rather
suspect that if I did have to port my code to such a machine,
I'd find a few places where I'd unconsciously made an assumption
too many, and it didn't work.

Alf P. Steinbach · Mar 24, 2008

* James Kanze:

* James Kanze:
[...]

On a Unisys MCP processor, INT_MAX and UINT_MAX are both equal
to 2^39-1, which of course, supposes 40 value bits for int
(since there must also be a sign bit) and 39 value bits for
unsigned int. sizeof(int) == 6, and CHAR_BIT == 8.

Click to expand...

Click to expand...

I think that scheme runs afoul of the standard.

Click to expand...

§3.9.1/3 "... the value representation of each corresponding
signed/unsigned type shall be the same".

Click to expand...

That's an interesting quote in itself. Taken too literally, it
means that either signed int cannot have a sign bit, or unsigned
int must. I'm pretty sure that that's not the intention, of
course.

Where "value representation" does not denote a mapping from
bitpatterns to conceptual values (a code), but is a term
defined earlier by §3.9/4, "The /value representation/ of an
object is the set of bits in the object representation that
determines a /value/, which is one discrete element of an
implementation-defined set of values."; i.e., the set of bits
that represents the value shall be the same for each
corresponding signed/unsigned type.

Click to expand...

Of course, rather obviously, that can only apply to values that
they have in common. There's no way that the "value
representation" of -1 in an unsigned int can be the same as in
an int, since the value -1 simply cannot exist in an unsigned
int.

The first error in your reasoning: the word "obviously".

There is no value representation of a particular value. "Value representation"
in the standard's sense is not a code. There is a set of bits that determines a
value for the object. That set of bits is required to be the same for
corresponding signed and unsigned types.

Cheers, & hth.,

- Alf

Alf P. Steinbach · Mar 24, 2008

* (e-mail address removed):

This is for corresponding values (i.e. the non-negative ones) only.

There is no such requirement imposed by the standard.

Your misconception may be the same as (apparently) James', that you're thinking
of "value representation" as a code, a mapping.

It is not, in the standard's sense: it's a set of bits.

Cheers, & hth.,

- Alf

jason.cipriani · Mar 24, 2008

* (e-mail address removed):

Hmmm... on further thought I see what you mean. That is, if the object
representation of an int is 42 bits, and 5 of those are padding bits
while the rest are meaningful in determining its value, "value
representation" refers to the set of 37 bits itself; not to a
particular 37-bit pattern.

There is no such requirement imposed by the standard.

What am I misreading, then, that is making the standard so unclear to
me here? It also states:

"The range of nonnegative values of a signed integer type is a
subrange of the corresponding unsigned integer type, and the value
representation of each corresponding signed/unsigned type shall be the
same."

It only states that the the nonnegative range of a signed type is a
subrange of the corresponding unsigned type. It does not state that
UINT_MAX must equal INT_MAX - INT_MIN. The number of values
representable by an unsigned and signed type need not be the same.
Therefore, one can have values that can't possibly be represented by
the other, even if you gave them the same bit pattern (for example, -1
for a signed int on the Unisys machine has no "equivalent" unsigned
value, where by "equivalent" I mean an valid unsigned value with the
same bit pattern).

However, taking 3.9/4 to mean that an "int" and "unsigned int" must be
defined by the same set of bits within their object representation,
this seems to imply that it is OK to have constraints on certain value
bits? In other words, the Unisys unsigned int's value representation
is indeed the same 40 bits as the signed int, except there is a
constraint that the sign bit must be 0, hence the smaller total range
of the unsigned type?

Still, in that case, if the standard *does* allow for constraints on
bits in the value representation for unsigned types, then the Unisys
machine still does not break it (since unsigned and signed int have
the same value representation, except the unsigned int has the
constraint on one of its value bits).

Jason

Alf P. Steinbach · Mar 24, 2008

* (e-mail address removed):

Hmmm... on further thought I see what you mean. That is, if the object
representation of an int is 42 bits, and 5 of those are padding bits
while the rest are meaningful in determining its value, "value
representation" refers to the set of 37 bits itself; not to a
particular 37-bit pattern.

What am I misreading, then, that is making the standard so unclear to
me here?

It appears that you're simply reading more (and less!

) into the standard
than what it states.

It also states:

"The range of nonnegative values of a signed integer type is a
subrange of the corresponding unsigned integer type, and the value
representation of each corresponding signed/unsigned type shall be the
same."

Yes.

That implicitly imposes a requirement on number of value representation bits for
the unsigned type, but that requirement is weaker than the one already
explicitly introduced by §3.9.1/4, and so is of no consequence whatsoever.

It only states that the the nonnegative range of a signed type is a
subrange of the corresponding unsigned type. It does not state that
UINT_MAX must equal INT_MAX - INT_MIN.

Well, forgive me for not addressing that. But OK. You see, this has been
discussed a number of times before, so since I know conclusion I've so far just
pointed out where to go to learn enough to rationalize that conclusion.

With respect to number ranges we're into §3.9.1/4, which states that "Unsigned
integers, declared <code>unsigned</code>, shall obey the laws of arithmetic
modulo 2^n, where n is the number of value representation bits of that
particular size of integer".

With "value representation" already defined, by §3.9/4, to be the same number of
bits as for the value representation of the signed type. So if the signed type
has 40-bits value representation, then §3.9.1/4 requires that the unsigned type
has value range 0...2^40-1, inclusive. It's simple, really.

Cheers, & hth.,

- Alf

Andrey Tarasevich · Mar 24, 2008

Old said:
All-bits-zero is guaranteed to be a valid
representation for 0 for any integral type,
is it not?

It is in the corrected C99 (one of the TC's, don't remember which one). In
C89/90 and C++ it's not guaranteed to be a valid representation. Eventually it
should make it into C++.

Of course, the fact that it is standardized in C99 means that it is so in practice.

James Kanze · Mar 24, 2008

* James Kanze:

* James Kanze:

Click to expand...

[...]

On a Unisys MCP processor, INT_MAX and UINT_MAX are both equal
to 2^39-1, which of course, supposes 40 value bits for int
(since there must also be a sign bit) and 39 value bits for
unsigned int. sizeof(int) == 6, and CHAR_BIT == 8.
I think that scheme runs afoul of the standard.
§3.9.1/3 "... the value representation of each corresponding
signed/unsigned type shall be the same".

Click to expand...

That's an interesting quote in itself. Taken too literally, it
means that either signed int cannot have a sign bit, or unsigned
int must. I'm pretty sure that that's not the intention, of
course.

Where "value representation" does not denote a mapping from
bitpatterns to conceptual values (a code), but is a term
defined earlier by §3.9/4, "The /value representation/ of an
object is the set of bits in the object representation that
determines a /value/, which is one discrete element of an
implementation-defined set of values."; i.e., the set of bits
that represents the value shall be the same for each
corresponding signed/unsigned type.

Click to expand...

Of course, rather obviously, that can only apply to values that
they have in common. There's no way that the "value
representation" of -1 in an unsigned int can be the same as in
an int, since the value -1 simply cannot exist in an unsigned
int.

Click to expand...

The first error in your reasoning: the word "obviously".

There is no value representation of a particular value.
"Value representation" in the standard's sense is not a code.
There is a set of bits that determines a value for the object.
That set of bits is required to be the same for corresponding
signed and unsigned types.

Except that they rather obviously can't be: the value
representation of a signed int requires a sign bit; the value
representation of an unsigned int cannot have a sign bit.

James Kanze · Mar 24, 2008

On Mar 24, 12:29 am, "Alf P. Steinbach" <[email protected]> wrote:

[...]

Still, in that case, if the standard *does* allow for constraints on
bits in the value representation for unsigned types, then the Unisys
machine still does not break it (since unsigned and signed int have
the same value representation, except the unsigned int has the
constraint on one of its value bits).

The obvious answer here is to ask the C committee what they
meant. (The one thing I am sure of here is that the intent of
the C++ committee was to be compatible with C in this regard.)
I don't think that the text is anywhere near as clear as it
could be, either in the C++ standard or in the C standard. And
this isn't the first time the question has been raised (although
I don't know whether it was ever directly addressed by the
committee).

In this regard, much of the wording here was reworked in C99,
presumably to address the ambiguities in C90, which have been
carried over into C99. From what I understand, the intent of
the rewording was NOT to change the actual meaning (except in so
far as it supports extended integral types), but to make the
original meaning clearer. With regards to the discussion at
hand, C99 says:

The range of nonnegative values of a signed integer type
is a subrange of the corresponding unsigned integer
type, and the representation of the same value in each
type is the same.

[...]

For unsigned integer types other than unsigned char, the
bits of the object representation shall be divided into
two groups: value bits and padding bits (there need not
be any of the latter). If there are N value bits, each
bit shall represent a different power of 2 between 1 and
2N-1, so that objects of that type shall be capable of
representing values from 0 to 2N - 1 using a pure binary
representation; this shall be known as the value
representation. The values of any padding bits are
unspecified

For signed integer types, the bits of the object
representation shall be divided into three groups: value
bits, padding bits, and the sign bit. There need not be
any padding bits; there shall be exactly one sign bit.
Each bit that is a value bit shall have the same value a
the same bit in the object representation of the
corresponding unsigned type (if there are M value bits
in the signed type and N in the unsigned type, then M <=
N). If the sign bit is zero, it shall not affect the
resulting value. [...]

The first thing we note is that all of the vague wording
about "identical value representations" has been dropped, in
favor of a much more thorough and explicit description. The
second thing is that the text explicitely allows the number
of value bits (non-sign and non-padding bits) in an integral
type to be equal to that in the corresponding unsigned type;
in other words, it explicitly says that the Unisys
implementation is legal.

As far as I'm concerned, that is conclusive with regards to
the intent in C90 (and thus indirectly in C++), although I
very definitely think that the text in both C90 and the
current C++ standard is open to several different
interpretations, including Alf's. (FWIW, I've generally
based my interpretations here on what I know of C, and the
fact that the C++ committee has several times expressed the
desire to be compatible with C here. On rereading the exact
words in the latest draft of the C++ standard, however, and
ignoring what I know of intent, I think that Alf's
interpretation is the most reasonable. Not necessarily the
only one possible, but certainly the most reasonable, by
far. If comp.std.c++ were up, I'd raise the issue there.
As a defect report, since given C99, this is definitely not
the case in C, and I'm pretty sure that the intent was
compatibility here.)

James Kanze · Mar 24, 2008

[...]

It appears that you're simply reading more (and less! )
into the standard than what it states.

Yes. That sometimes happens when you've been privy to some
internal discussions, and know the actual intent

. In this
case, to be 100% compatible with C.

Knowing the actual intent, I would say that if the text in the
C++ standard doesn't allow the interpretation given by the C
standard committee (which in later documents explicitly allows
the Unisys representation), then it is a defect in the standard.
And in fact...

With "value representation" already defined, by §3.9/4, to be
the same number of bits as for the value representation of the
signed type.

This is, I think, the defect. When the C++ committee reworded
this section with regards to C90, they unintentionally
introduced some additional constraints, that weren't present in
C90.

So if the signed type has 40-bits value representation, then
§3.9.1/4 requires that the unsigned type has value range
0...2^40-1, inclusive. It's simple, really.

On rereading just the C++ standard (and forcing myself to ignore
what I know of the intent and of the C standards), I think
you're right. If this is the actual intent of the C++ committee
(and I don't think it was), then it should be mentionned in
Appendix C, since it is a definite incompatibility with C.

James Kanze · Mar 24, 2008

It is in the corrected C99 (one of the TC's, don't remember
which one). In C89/90 and C++ it's not guaranteed to be a
valid representation. Eventually it should make it into C++.

That's strange. My copy of C99 says exactly the opposite: that
"the values of any padding bits are unspecified". There is even
a footnote which says that certain values of padding bits may
result in a trapping representation.

The result is that in C99, memset can be used to initialize
arrays of character types, but is not guaranteed to do anything
useful in any other cases.

Of course, in C++, who cares? We have std::fill and std::fill_n
which are guaranteed to work correctly, for all types (including
those with user defined constructors and assignment operators).

Of course, the fact that it is standardized in C99 means that
it is so in practice.

The changes in this text in C99 was meant to clarify what was
intended (but not expressed clearly) in C90.

Alf P. Steinbach · Mar 24, 2008

* James Kanze:

* James Kanze:

* James Kanze:
[...]
On a Unisys MCP processor, INT_MAX and UINT_MAX are both equal
to 2^39-1, which of course, supposes 40 value bits for int
(since there must also be a sign bit) and 39 value bits for
unsigned int. sizeof(int) == 6, and CHAR_BIT == 8.
I think that scheme runs afoul of the standard.
§3.9.1/3 "... the value representation of each corresponding
signed/unsigned type shall be the same".
That's an interesting quote in itself. Taken too literally, it
means that either signed int cannot have a sign bit, or unsigned
int must. I'm pretty sure that that's not the intention, of
course.
Where "value representation" does not denote a mapping from
bitpatterns to conceptual values (a code), but is a term
defined earlier by §3.9/4, "The /value representation/ of an
object is the set of bits in the object representation that
determines a /value/, which is one discrete element of an
implementation-defined set of values."; i.e., the set of bits
that represents the value shall be the same for each
corresponding signed/unsigned type.
Of course, rather obviously, that can only apply to values that
they have in common. There's no way that the "value
representation" of -1 in an unsigned int can be the same as in
an int, since the value -1 simply cannot exist in an unsigned
int.

Click to expand...

Click to expand...

The first error in your reasoning: the word "obviously".

Click to expand...

There is no value representation of a particular value.
"Value representation" in the standard's sense is not a code.
There is a set of bits that determines a value for the object.
That set of bits is required to be the same for corresponding
signed and unsigned types.

Click to expand...

Except that they rather obviously can't be: the value
representation of a signed int requires a sign bit; the value
representation of an unsigned int cannot have a sign bit.

Uh, that word again...

Well, the C++ "value representation" includes sign bit, because it is by
definition the set of bits that determines the value.

So, no problem -- except with C compatibility.

And I think the C++ rules are the most reasonable, because they guarantee that
every valid signed value can be mapped to a unique valid unsigned value in the
corresponding unsigned type, which is not the case with C rules.

Fine point: it seems that with C++ rules you can freely cast from valid signed
to corresponding unsigned and back, but not freely cast from valid unsigned to
corresponding signed (because the unsigned value might correspond to a not
supported signed value, such as e.g. the unsigned bitpattern 2^(n-1)), except
for char types, where casting back and forth is always OK.

Cheers, & hth.,

- Alf

Andrey Tarasevich · Mar 24, 2008

James said:
That's strange. My copy of C99 says exactly the opposite: that
"the values of any padding bits are unspecified". There is even
a footnote which says that certain values of padding bits may
result in a trapping representation.

DR#263

http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_263.htm

My copy if C99+TC2 (N1124) already includes this change.

James Kanze · Mar 25, 2008

* James Kanze:

[...]

Uh, that word again...

Yep. It does seem rather obvious that if the sign bit is part
of the value representation, then the value representation of a
signed int cannot be the same as that of an unsigned int.
Unless you twist the meaning of "representation" considerably.

Well, the C++ "value representation" includes sign bit,
because it is by definition the set of bits that determines
the value.

So, no problem -- except with C compatibility.

Which is a major problem in this case.

And I think the C++ rules are the most reasonable, because
they guarantee that every valid signed value can be mapped to
a unique valid unsigned value in the corresponding unsigned
type, which is not the case with C rules.

Maybe, but they also break existing implementations, and make an
implementation practically impossible on some existing hardware.
If we want to go that route, of course, then we might as well
follow Java, and require 32 bit 2's complement at the same time.
It certainly makes writing portable code easier, at the cost of
not being able to effectively support some platforms.
Historically, C and C++ has not gone this way, and have
preferred allowing efficient implementations on a maximum of
different platforms.

Note that even if the standard allows exotic representations,
they won't necessarily concern most programmers. The next
release of the standard will incorporate a lot of the C99
additions with regards to integral types---in particular, if it
matters, you can just use int32_t---which won't be defined if
the system doesn't support 32 bit 2's complement integers.
So---much like the situation with floating pointer---you
actually have two languages: one which requires specific sizes
and representations (like Java), and one which doesn't. And
each programmer can choose which one is relevant for his work.

Fine point: it seems that with C++ rules you can freely cast
from valid signed to corresponding unsigned and back, but not
freely cast from valid unsigned to corresponding signed
(because the unsigned value might correspond to a not
supported signed value, such as e.g. the unsigned bitpattern
2^(n-1)), except for char types, where casting back and forth
is always OK.

Casting signed to unsigned is always OK. But there is no
bijection between the values of the corresponding signed and
unsigned types, even with char. And casting an unsigned type to
signed usually results in implementation defined behavior,
except in the special case where xxx_MAX === Uxxx_MAX.
Typically, 2's complement machines just use the bit patterns in
both directions, which gives the required results for signed to
unsigned, and the "usual" results for unsigned to signed. On a
1's complement or signed magnitude machine, I don't know. The
conversion signed to unsigned requires some additional code; you
can't just reinterpret the bit pattern. The conversion unsigned
to signed is implementation defined (and according to the C
standard, may result in an implementation defined signal); if
they just interpret the bit pattern, then the results will not
be the same as you get on a 2's complement machine. At any
rate, a bijection isn't possible, since the unsigned type
(supposing it uses the sign bit as a value bit) can contain one
more value than the signed type (which has two distinct
representations for 0---one of which may trap).

Anyway, I'll raise the issue with the C++ committee. Whatever
they decide, we'll know at least that it's intentional.

Alf P. Steinbach · Mar 25, 2008

* James Kanze:

* James Kanze:
[...]

There is no value representation of a particular value.
"Value representation" in the standard's sense is not a code.
There is a set of bits that determines a value for the object.
That set of bits is required to be the same for corresponding
signed and unsigned types.
Except that they rather obviously can't be: the value
representation of a signed int requires a sign bit; the value
representation of an unsigned int cannot have a sign bit.

Click to expand...

Click to expand...

Uh, that word again...

Click to expand...

Yep. It does seem rather obvious that if the sign bit is part
of the value representation, then the value representation of a
signed int cannot be the same as that of an unsigned int.

Well, "cannot": with both C++ compilers I use regularly (or should I say
semi-regularly) the value representation of a signed int /is/ the same as that
of an unsigned int, using the standard's definition of "value representation".

And I doubt that there are more than one or two C++ compilers other than the
esoteric one you mentioned where that isn't the case; probably zero.

This is the nice thing about "cannot": a single counter-example suffices to
disprove the notion, and here we have a phletora, an abundance, a host of
counter-examples, where the problem isn't to find a counter-example but rather
to find an example!

Unless you twist the meaning of "representation" considerably.

The standard's definition is clear, short and simple, so I'll just repeat it:
§3.9/4 "The /value representation/ of an object is the set of bits in the object
representation that determines a /value/, which is one discrete element of an
implementation-defined set of values." -- the value representation is a set of
bits.

Which is a major problem in this case.

I don't think so. It's a problem, for one compiler. But then, at least one
something is a problem for just about any compiler...

Maybe, but they also break existing implementations,

Do we have more than one?

and make an
implementation practically impossible on some existing hardware.

Depends what performance or value range hit you're willing to take, i.e. what's
meant by "practically".

Let's say the ENIAC-like hardware only directly supports signed integers, and
then using silly magnitude and sign instead of reasonable two's complement.

Even with that absurd hardware-level choice proper n-bits unsigned values can be
represented directly as signed ones, interpreting signed values as being modulo
2^n. And, special case, signed value -0 as unsigned value 2^(n-1)).

Signed value: Unsigned value: Range:
0 <= v < 2^(n-1) u = v 0 <= u < 2^(n-1)
v = -0 u = 2^(n-1) u = 2^(n-1)
-(2^(n-1)-1) <= v < 0 u = v + 2^n 2^(n-1)+1 <= u < 2^n

Code for addition, ignoring the issue of overflow:

int flip( x )
{
// Produce same value as by flipping sign bit in two's complement form.
return (0?0
: x < 0? (x + INT_MAX) + 1
: x == 0? oppositeSignForZero(x)
: (x - INT_MAX) - 1
);
}

int unsignedAdd( int a, int b )
{
return (isNegativeZero(a)? flip(b) : isNegativeZero(b)? flip(a) : a+b);
}

(Disclaimer: if this code works then it was written by me, and otherwise it's
someone impersonating me.)

The "a+b" in there follows because with two's complement that would produce the
desired unsigned bitpattern, and with signed magnitude it produces the same
conceptual signed value, which serves nicely to represent the unsigned value.

If I were sure of what ~x does on sign and magnitude representation this code
could perhaps be further simplified.

Overflow is a problem, but it's also a problem when restricting unsigned to the
positive range of signed. It will have to be handled in the same (whatever)
way. If that's what makes an implementation practically impossible, then it's
practically impossible no matter what the standard says about representation,
unless the requirement of modulo 2^n arithmetic for unsigned is removed.

I think that what seems "bad" about this is that there is a cost for using
unsigned arithmetic.

If the ENIAC-like machine supports n-1 bits unsigned arithmetic operations
(where n is the size in bits of the value representation of signed), then one
might instead choose another cost, namely that of limiting the supported range
of signed, instead of "extending" the range of unsigned.

If we want to go that route, of course, then we might as well
follow Java, and require 32 bit 2's complement at the same time.

Requiring 2's complement would IMHO be a good idea. Requiring 32-bits would, on
the other hand, break a lot of implementations.

It certainly makes writing portable code easier, at the cost of
not being able to effectively support some platforms.
Historically, C and C++ has not gone this way, and have
preferred allowing efficient implementations on a maximum of
different platforms.

That seems to imply that requiring 2's complement would not allow efficient
implementations on maximum different relevant platforms, which is not the case.

[snip]

Casting signed to unsigned is always OK. But there is no
bijection between the values of the corresponding signed and
unsigned types, even with char. And casting an unsigned type to
signed usually results in implementation defined behavior,

Well yeah, but I don't understand why you're elaborating so on what I wrote.

except in the special case where xxx_MAX === Uxxx_MAX.

Not permitted by the standard.

Typically, 2's complement machines just use the bit patterns in
both directions, which gives the required results for signed to
unsigned, and the "usual" results for unsigned to signed. On a
1's complement or signed magnitude machine, I don't know. The
conversion signed to unsigned requires some additional code; you
can't just reinterpret the bit pattern. The conversion unsigned
to signed is implementation defined (and according to the C
standard, may result in an implementation defined signal); if
they just interpret the bit pattern, then the results will not
be the same as you get on a 2's complement machine. At any
rate, a bijection isn't possible, since the unsigned type
(supposing it uses the sign bit as a value bit) can contain one
more value than the signed type (which has two distinct
representations for 0---one of which may trap).
Yeah.

Anyway, I'll raise the issue with the C++ committee. Whatever
they decide, we'll know at least that it's intentional.

I hope they land on keeping current (literal) rules.

Because I rather like
having a guarantee that any signed value, when treating -0 and +0 as the same
value (which is how it is in C++), can be mapped to a unique unsigned one.

Cheers,

- Alf

James Kanze · Mar 26, 2008

* James Kanze:

* James Kanze:

Click to expand...

[...]

There is no value representation of a particular value.
"Value representation" in the standard's sense is not a code.
There is a set of bits that determines a value for the object.
That set of bits is required to be the same for corresponding
signed and unsigned types.
Except that they rather obviously can't be: the value
representation of a signed int requires a sign bit; the value
representation of an unsigned int cannot have a sign bit.
Uh, that word again...

Click to expand...

Yep. It does seem rather obvious that if the sign bit is part
of the value representation, then the value representation of a
signed int cannot be the same as that of an unsigned int.

Click to expand...

Well, "cannot": with both C++ compilers I use regularly (or
should I say semi-regularly) the value representation of a
signed int /is/ the same as that of an unsigned int, using the
standard's definition of "value representation".

Except that the standard contradicts (or extends?) itss own
definition in the very next sentence: "the value representation
determines a value". It's hard to conceive of the word value
not having any relationship to the semantics.

And I doubt that there are more than one or two C++ compilers
other than the esoteric one you mentioned where that isn't the
case; probably zero.

This is the nice thing about "cannot": a single
counter-example suffices to disprove the notion, and here we
have a phletora, an abundance, a host of counter-examples,
where the problem isn't to find a counter-example but rather
to find an example!

You don't seem to have grasped that the issue is one of
language. The English language, the language in which the
standard is written. The standard does add or limit some
definitions, in specific context, and this is one. But even
it's definitions are couched in English. In English, if a set
of bits "holds a value", it is implicit that somehow a specific
semantic is associated with different bit patterns. The word
"representation" also implies some sort of mapping.

Now, you can ignore the fact that words in English have
established meanings, and base your argument on the fact that
the C++ standard defines "value representation" as a "set of
bits that holds a value", considering this definition complete
and authoritive. And it is authoritive in the sense of the
standard. On the other hand, I don't think it can be considered
complete, and the fact that it is not complete (and the
standard doesn't really specify how to complete it) is a defect
in the standard.

Of course, this discussion is not new. The C++ standard, here,
is more or less identical to the C90 standard. And the same
problems which you raise were raised with regards to the C
standard; the standard clearly says something that doesn't make
sense, and isn't complete enough to ensure full understanding.
The result of the discussion in the C committee resulted in a
complete rewording, in order to clarify the issues.

The standard's definition is clear, short and simple,

And incomplete.

so I'll
just repeat it: §3.9/4 "The /value representation/ of an
object is the set of bits in the object representation that
determines a /value/, which is one discrete element of an
implementation-defined set of values." -- the value
representation is a set of bits.

Note the representation determines a value. Thus, the sign bit
in a signed representation is not the same bit (even if it
physically occupies the same place) as the 2^n bit of an
unsigned.

Or whatever. It really doesn't matter much how you or I
interpret it, since the original authors (in the C committee)
have recognized that it is ambiguous, and that it isn't clear,
and have reworded it into something which is clear and
unambiguous.

I don't think so. It's a problem, for one compiler. But
then, at least one something is a problem for just about any
compiler...

It's a problem with the definition of the language. The problem
may only appear with one particular compiler today (although I'm
far from sure---I'm not at all familiar with compilers for
things like DSP's, which also often have architectures which
would appear strange to those only familiar with general purpose
machines), but it places a certain number of constraints on
future implementations as well.

Do we have more than one?

Maybe, maybe not. I'm not familiar with all existing
implementations.

Depends what performance or value range hit you're willing to
take, i.e. what's meant by "practically".

Let's say the ENIAC-like hardware only directly supports
signed integers, and then using silly magnitude and sign
instead of reasonable two's complement.

There's actually nothing silly about it. It's very elegant, in
fact, and in many ways, much closer to high level languages than
to the classical assembler. (You only have one add instruction,
which does either floating point or integer arithmetic according
to the type of data it is working on.)

But that's not the point. The question is: do we take the
direction of Java, imposing a specific representation, or do we
follow the tradition of C, leaving a maximum of liberty to the
implementation, to do whatever is most effective on its
hardware? And of course, the answer isn't cast in stone, nor
always black and white: the C++ (like the C) standard already
requires binary representation, defines unsigned arithmetic to
be modulo 2^n, requires at least 8 bits in a char, and that you
can access all of the bits of any data type as an array of char.
(Which excludes the usual conventions on a PDP-10, which were 5
seven bit bytes per 36 bit word. But the byte length on a
PDP-10 is programmable, and presumably, the implementations of
C or C++ used 4 nine bit bytes per 36 bit int.)

If the ENIAC-like machine

I don't know where you get this ENIAC-like. I have no idea what
the architecture of the ENIAC was (but note that a lot of the
really old machines, like the IBM 1401, used decimal arithmetic,
and thus really can't support C).

The machine in question is the Unisys MCP architecture. It is a
rather distant descendant of the old Burroughs serie A machines.
And in many ways, its architecture is a lot more "modern" than
that of the Intel processors; it certainly maps more closely to
high level languages.

supports n-1 bits unsigned arithmetic operations (where n is
the size in bits of the value representation of signed), then
one might instead choose another cost, namely that of limiting
the supported range of signed, instead of "extending" the
range of unsigned.

Requiring 2's complement would IMHO be a good idea. Requiring
32-bits would, on the other hand, break a lot of
implementations.

I only know of two (except for the older 16 bit implementations,
and they had 32 bit longs), and neither of them uses 2's
complement.

Anyway, I'm not really going to enter into a long discussion as
to whether it would be a good idea or not. There are valid
arguments on both sides, and in the end, it is really a matter
of personal opinion. I do think, however, that the issue should
be raised in more general terms: to what degree should C++
support "exotic" architectures. Recognizing the fact that they
are becoming rarer.

That seems to imply that requiring 2's complement would not
allow efficient implementations on maximum different relevant
platforms, which is not the case.

If the platform doesn't support 2's complement, then requiring
it does cause more than a few problems. And have serious
performance implementations.

[snip]

Anyway, I'll raise the issue with the C++ committee. Whatever
they decide, we'll know at least that it's intentional.

Click to expand...

I hope they land on keeping current (literal) rules.
Because I rather like having a guarantee that any signed
value, when treating -0 and +0 as the same value (which is how
it is in C++), can be mapped to a unique unsigned one.

The current (literal) rules were considered ambiguous by the C
committee, when they discussed the issue. And the current rules
do not guarantee a bijection between signed and unsigned types
(which I think is what you want). For the guarantee that you
want, you'd also have to modify the text in [conv.integral],
which says that "If the destination type is signed, the value is
unchanged if it can be represented in the destination type (and
bit-field width); otherwise, the value is
implementation-defined." (This too has been clarified in C99.
Does "the value is implementation-defined" allow for a trap?
C99 says explcitly that "either the result is
implementation-defined or an implementation-defined signal is
raised." Again, according to the C committee, this is not a
real change, but a clarification of the intent of the original C
standard.)

Anyway, I'll raise the entire issue with the C++ committee.
(Probably not before this weekend, because it will take some
time to write up all of the details, summarize our discussion
here, propose a correction based on the C standard, etc.) I
think it would be highly profitable for all people concerned if
you could participate in the discussions: if not as a formal
member of a national body, then at least informally. If you're
interested, let me know by email, and I'll contact the people
necessary to set it up.

Pete Becker · Mar 26, 2008

I don't know where you get this ENIAC-like. I have no idea what
the architecture of the ENIAC was (but note that a lot of the
really old machines, like the IBM 1401, used decimal arithmetic,
and thus really can't support C).

ENIAC had 20 ten-digit accumulators, using decimal arithmetic with
tens-complement negative values. It had some limited facilities for
twenty-digit operations. But always decimal. It's successor, the EDVAC,
whose design was strongly influenced by ENIAC's wekanesses, used binary
arithmetic and twos-complement negative values. It had a serial adder:
one bit position at a time. That kept the hardware cost down, and was a
good match for the mercury delay line memory, which also stored data
serially.

Alf P. Steinbach · Mar 26, 2008

* James Kanze:

* James Kanze:

* James Kanze:
[...]
There is no value representation of a particular value.
"Value representation" in the standard's sense is not a code.
There is a set of bits that determines a value for the object.
That set of bits is required to be the same for corresponding
signed and unsigned types.
Except that they rather obviously can't be: the value
representation of a signed int requires a sign bit; the value
representation of an unsigned int cannot have a sign bit.
Uh, that word again...
Yep. It does seem rather obvious that if the sign bit is part
of the value representation, then the value representation of a
signed int cannot be the same as that of an unsigned int.

Click to expand...

Click to expand...

Well, "cannot": with both C++ compilers I use regularly (or
should I say semi-regularly) the value representation of a
signed int /is/ the same as that of an unsigned int, using the
standard's definition of "value representation".

Click to expand...

Except that the standard contradicts (or extends?) itss own
definition in the very next sentence: "the value representation
determines a value". It's hard to conceive of the word value
not having any relationship to the semantics.

It has a relationship, yes, but not the one you want.

All the standard says is that the set of bits determines the value.

In some unspecified (but rather heavily constrained) way.

[snip]

I don't know where you get this ENIAC-like.

Hm, you don't seem to have grasped that this issue is one of language, the
English language.

"ENIAC" refers, by association, to very old, clunky, no-longer-relevant machines.

It is not a technical description.

Cheers,

- Alf

James Kanze · Mar 26, 2008

[...]

Hm, you don't seem to have grasped that this issue is one of
language, the English language.

"ENIAC" refers, by association, to very old, clunky,
no-longer-relevant machines.

ENIAC happens to be a very real machine. As for associations,
it has absolutely nothing to with the Unisys MCP line (which
ultimately derive from the Burroughs machines, not the Univac
ones). As for no-longer-relevant, the machines are still being
produced and sold, which pretty much makes them relevant. Maybe
for you, everything is Wintel, but there are people who have to
deal with other architectures. (Presumably, if the architecture
is still in production, it is because it has some advantages for
someone.)

James Kanze · Mar 26, 2008

ENIAC had 20 ten-digit accumulators, using decimal arithmetic with
tens-complement negative values. It had some limited facilities for
twenty-digit operations. But always decimal. It's successor, the EDVAC,
whose design was strongly influenced by ENIAC's wekanesses, used binary
arithmetic and twos-complement negative values. It had a serial adder:
one bit position at a time. That kept the hardware cost down, and was a
good match for the mercury delay line memory, which also stored data
serially.

And I thought my experience went back in time

. (The first
program I ever wrote was for an IBM 1401.)

Advancing Through std::vector	15	Sep 11, 2013
Creating a vector of queues	2	Jan 23, 2025
Aw: char array nummterminated to std::vector<std::string>	1	Apr 22, 2011
Copying part of a vector element to a string variable	3	Oct 8, 2013
Sorting Structure Data within Vector	1	Mar 27, 2013
Difference between pointer to the no data member and the pointer tothe first member	10	Jun 30, 2013
How to loop in folder through all excel files and all sheets using pandas?	0	Dec 1, 2022
sorting 5million minute data: choose std::list or std::vector	98	Feb 2, 2011

Accessing std::vector data through pointer to first element.

jason.cipriani

James Kanze

Alf P. Steinbach

Alf P. Steinbach

jason.cipriani

Alf P. Steinbach

Andrey Tarasevich

James Kanze

James Kanze

James Kanze

James Kanze

Alf P. Steinbach

Andrey Tarasevich

James Kanze

Alf P. Steinbach

James Kanze

Pete Becker

Alf P. Steinbach

James Kanze

James Kanze

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads