internal integer repres.

  • Thread starter Mantorok Redgormor
  • Start date
M

Mantorok Redgormor

are all integers represented internally as just bit vectors?

-- nethlek
 
J

Joona I Palaste

Mantorok Redgormor said:
are all integers represented internally as just bit vectors?

ISO C does not define how integers are represented internally.
 
H

Hallvard B Furuseth

Mantorok said:
are all integers represented internally as just bit vectors?

Yes. Computers represent everything as bit vectors.

Well, someone might have made a trinary computer for a lark, but C
specifies that a byte consists of a sequence of bits, so its C
implementation must do the same.
 
S

Sheldon Simms

are all integers represented internally as just bit vectors?

-- nethlek

Not necessarily, apart from bitfields and variables of type
unsigned char. However, the required semantics of unsigned integer
types make them act as if they were so represented.
 
M

Malcolm

Mantorok Redgormor said:
are all integers represented internally as just bit vectors?
Basically yes. You could build a computer that doesn't use binary, but then
C wouldn't be a good language to use with it.
The vast majority of systems use two's complement notation for negative
numbers, eg -1 = all bits set. C doesn't actually require this but most real
C programs would probably break if ported to a non-two's complement machine.
 
J

Jack Klein

ISO C does not define how integers are represented internally.

That's an overstatement of the situation.

C requires a pure binary notation, and allows exactly three
specifically defined types of representations of signed integer types
with negative values.

While some details are left implementation-defined, there is quite a
lot mandated by the standard.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq
 
G

Glen Herrmannsfeldt

Malcolm said:
Basically yes. You could build a computer that doesn't use binary, but then
C wouldn't be a good language to use with it.
The vast majority of systems use two's complement notation for negative
numbers, eg -1 = all bits set. C doesn't actually require this but most real
C programs would probably break if ported to a non-two's complement
machine.

They might, but it commonly comes up in discussions in the newsgroup when
posts don't consider number representation.

-- glen
 
P

Peter Nilsson

Joona I Palaste said:
ISO C does not define how integers are represented internally.

Could you please clarify what you mean by your statement?

ISO C does define how they are represented 'externally' (i.e. from the
view of the programmer).

C90 defined them implicitly through an external specification ('pure
binary form') and C99 went so far as to name three classes of
representation. [Pete recently supplied simple methods of
distinguishing the three at runtime (at least).]

In any case, *all* objects in C can be viewed as raw (unsigned char)
bytes, which can in turn be viewed as bit vectors (of CHAR_BIT order).
 
J

James Dow Allen

Malcolm said:
The vast majority of systems use two's complement notation for negative
numbers, eg -1 = all bits set. C doesn't actually require this but most real
C programs would probably break if ported to a non-two's complement machine.

I strongly doubt this is true (though it's been over 3 decades since I worked
regularly on one's-complement machines). Perhaps you can post a suspicious
*common* code example.

I realize some code will fail, and perhaps even most programs with 100,000+
LOC will have some routine that fails, but will "most real C programs" break?

James
 
D

Dan Pop

In said:
ISO C does not define how integers are represented internally.

If you were right, a Gray code-based implementation would be perfectly
conforming. Let's see if you're right:

6.2.6.2 Integer types

1 For unsigned integer types other than unsigned char, the bits
of the object representation shall be divided into two groups:
value bits and padding bits (there need not be any of the
latter). If there are N value bits, each bit shall represent
a different power of 2 between 1 and 2**(N-1), so that objects of
that type shall be capable of representing values from 0 to 2**N - 1
using a pure binary representation; this shall be known as
the value representation. The values of any padding bits are
unspecified.44)

2 For signed integer types, the bits of the object representation
shall be divided into three groups: value bits, padding bits,
and the sign bit. There need not be any padding bits; there shall
be exactly one sign bit. Each bit that is a value bit shall have
the same value as the same bit in the object representation
of the corresponding unsigned type (if there are M value bits
in the signed type and N in the unsigned type, then M <= N).
If the sign bit is zero, it shall not affect the resulting value.
If the sign bit is one, the value shall be modified in one of
the following ways:

- the corresponding value with sign bit 0 is negated (sign
and magnitude);

- the sign bit has the value -(2**N) (two's complement);

- the sign bit has the value -(2**N - 1) (one's complement).

Which of these applies is implementation-defined, as is
whether the value with sign bit 1 and all value bits zero
(for the first two), or with sign bit and all value bits 1 (for
one's complement), is a trap representation or a normal value.
In the case of sign and magnitude and one's complement, if this
representation is a normal value it is called a negative zero.

3 If the implementation supports negative zeros, they shall be
generated only by:

- the &, |, ^, ~, <<, and >> operators with arguments that
produce such a value;

- the +, -, *, /, and % operators where one argument is a negative
zero and the result is zero;

- compound assignment operators based on the above cases.

It is unspecified whether these cases actually generate a
negative zero or a normal zero, and whether a negative zero
becomes a normal zero when stored in an object.

4 If the implementation does not support negative zeros, the
behavior of the &, |, ^, ~, <<, and >> operators with arguments
that would produce such a value is undefined.

5 The values of any padding bits are unspecified.45) A valid
(non-trap) object representation of a signed integer type where
the sign bit is zero is a valid object representation of the
corresponding unsigned type, and shall represent the same value.

6 The precision of an integer type is the number of bits it uses
to represent values, excluding any sign and padding bits. The
width of an integer type is the same but including any sign bit;
thus for unsigned integer types the two values are the same,
while for signed integer types the width is one greater than
the precision.

____________________

44) Some combinations of padding bits might generate trap
representations, for example, if one padding bit is a parity
bit. Regardless, no arithmetic operation on valid values
can generate a trap representation other than as part of an
exceptional condition such as an overflow, and this cannot
occur with unsigned types. All other combinations of padding
bits are alternative object representations of the value
specified by the value bits.

45) Some combinations of padding bits might generate trap
representations, for example, if one padding bit is a parity
bit. Regardless, no arithmetic operation on valid values
can generate a trap representation other than as part of
an exceptional condition such as an overflow. All other
combinations of padding bits are alternative object
representations of the value specified by the value bits.

That's quite a lot of text about how integers are represented and,
incidentally, it rules out Gray coded integers...

Dan
 
G

Glen Herrmannsfeldt

Dan Pop said:
In <[email protected]> Joona I Palaste

If you were right, a Gray code-based implementation would be perfectly
conforming. Let's see if you're right:

(snip of quotes from some version of the C standard.)
That's quite a lot of text about how integers are represented and,
incidentally, it rules out Gray coded integers...

I think there was a discussion some time ago on the use of BCD
representation. Following the "as if" rule, I believe others said it was
possible. It won't be easy, for example, to make shift operators, or bit
and/or/xor work right.

I would say that the "as if" rule applies here, too. One could, for
example, store numbers in gray code in memory but convert to one of the
standard forms in registers. (More popular is to use ECC coding in memory
and not in registers.) If you didn't do that, arithmetic operations would
be complicated. (I haven't thought about doing addition in gray code, and
multiplication would be even worse.)

I believe also byte and bit order in memory is not defined by the standard.
While big and little endian are popular, others are possible. If a machine
did store integers of different sizes in memory using gray code, the results
of accessing parts of such numbers would be very different than the normal
big/little endian results, but I would say legal within the standard.

But yes, all the C operators must supply results following the standard,
which mostly covers the possibilities of twos complement, ones complement
(CDC and Univac that I know of), and sign magnitude (the last I know of is
the 7090).

If the "internally" in the OP question means "in memory" then I would say
that it could be done.

-- glen
 
S

Simon Biber

Glen Herrmannsfeldt said:
I think there was a discussion some time ago on the use of BCD
representation. Following the "as if" rule, I believe others said it was
possible. It won't be easy, for example, to make shift operators, or bit
and/or/xor work right.

I would say that the "as if" rule applies here, too. One could, for
example, store numbers in gray code in memory but convert to one of the
standard forms in registers. (More popular is to use ECC coding in memory
and not in registers.) If you didn't do that, arithmetic operations would
be complicated. (I haven't thought about doing addition in gray code, and
multiplication would be even worse.)

The standard does not differentiate between register and memory storage, so
the as if rule could work there. However, it does allow you to inspect the
object representation using, for example, an array of unsigned chars. If you
inspect the representation of an integer in that way, it MUST conform to the
binary representation specified.

For example:

int main(void)
{
type a;
size_t i, j;
unsigned char *p = (unsigned char *)&a;
for(a = TYPE_MAX; a; a >>= 1)
{
for(i = 0; i < sizeof a; i++)
{
for(j = 0; j < CHAR_BIT; j++)
{
printf("%d", (p & (1 << j)) != 0);
}
printf(" ");
}
printf("\n");
}
return 0;
}

This should output the object representation in binary of various
values of an integer type given 'type' and 'TYPE_MAX' are defined.

It would be non-conforming if this output represented anything
other than value bits, padding bits and sign bit. (In any order.)
I believe also byte and bit order in memory is not defined by
the standard. While big and little endian are popular, others
are possible.

This is true, the order of the various bits is not specified, which
allows big and little endian and other orderings. However, there
is not enough latitude to allow a gray code encoding.
If a machine did store integers of different sizes in memory
using gray code, the results of accessing parts of such numbers
would be very different than the normal big/little endian
results, but I would say legal within the standard.

I would say illegal.
But yes, all the C operators must supply results following
the standard, which mostly covers the possibilities of twos
complement, ones complement (CDC and Univac that I know of),
and sign magnitude (the last I know of is the 7090).

Signed integers must use one of those three encodings.
If the "internally" in the OP question means "in memory" then
I would say that it could be done.

Only if it was made completely transparent to the memory access
tricks -- including memcpy -- so probably too much bother.
 
M

Malcolm

James Dow Allen said:
Perhaps you can post a suspicious *common* code example.
We tend to develop games using a PC game editor, for a console target. The
editor spits out a binary file, and the console reads it. I often provide
two functions, fput16() and fget16() to write and read 16-bit integers.
fget16() sign-extends the number it reads, so will break on a non-two's
complement machine.
Now strictly we should be using text to exchange data between platforms, but
that's not the real world. Strictly I suppose I could write fget16() so that
it didn't break on a one's complement machine, but that's hardly worth
doing.
 
G

Glen Herrmannsfeldt

Malcolm said:
We tend to develop games using a PC game editor, for a console target. The
editor spits out a binary file, and the console reads it. I often provide
two functions, fput16() and fget16() to write and read 16-bit integers.
fget16() sign-extends the number it reads, so will break on a non-two's
complement machine.
Now strictly we should be using text to exchange data between platforms, but
that's not the real world. Strictly I suppose I could write fget16() so that
it didn't break on a one's complement machine, but that's hardly worth
doing.

Unless Univac gets into the game console market, I think you are safe.

Many cross compilers make the assumption that the target is either big or
little endian twos complement, as is the source machine. That is a
reasonable, but not completely portable, implementations.

-- glen
 
G

Glen Herrmannsfeldt

Simon Biber said:
The standard does not differentiate between register and memory storage, so
the as if rule could work there. However, it does allow you to inspect the
object representation using, for example, an array of unsigned chars. If you
inspect the representation of an integer in that way, it MUST conform to the
binary representation specified.

(snip of program to print a binary representation of memory for a variable,
in an implementation dependent order.)
This should output the object representation in binary of various
values of an integer type given 'type' and 'TYPE_MAX' are defined.

It would be non-conforming if this output represented anything
other than value bits, padding bits and sign bit. (In any order.)

So you will allow the 32! ( about 2.6e35) but you won't allow gray code.
This is true, the order of the various bits is not specified, which
allows big and little endian and other orderings. However, there
is not enough latitude to allow a gray code encoding.
I would say illegal.

I will see if anyone else wants to comment. Note, though, that if they
were stored in memory as gray code, the unsigned chars would be converted
back to normal binary code. I haven't tried to figure out what that does to
the result.

-- glen
 
C

Chris Torek

Malcolm said:
... I suppose I could write fget16() so that it didn't break on a
one's complement machine, but that's hardly worth doing.

It probably does not gain anything -- as someone else noted, it
seems unlikely anyone will produce a ones' complement game-console
system -- but on the other hand, it is easy, and in general loses
little to nothing, to write a portable version of "get two 8-bit
bytes and combine them to make a signed value that is interpreted
as if it were two's complement":

long fget16(FILE *fp) {
unsigned char hi, lo;
unsigned int composite;
int c;

if ((c = getc(fp)) == EOF)
... handle EOF ...
hi = c; /* or lo = c, if little-endian */
if ((c = getc(fp)) == EOF)
... handle EOF ...
lo = c;
composite = ((unsigned int)hi << 8) | lo;
/* or: composite = hi * 256U + lo; */
return (long)(composite ^ 0x8000U) - 32768L;
}

The conversion of the 16-bit value in "composite" to a signed long
is guaranteed to work regardless of the underlying representation of
(signed) long. The reason is simple enough:

- (composite ^ 0x8000U) has the corresponding value in
[32768..65535] when composite itself is in [0..0x7fff].
Converting that to long preserves its value, and
subtracting 32768L produces a value in [0..32767] as
desired.

- (composite ^ 0x8000U) has the corresponding value in
[0..32767] when composite itself is in [0x8000..0xffff].
Converting that to long preserves its value, and
subtracting 32768L produces a value in [-32768..-1] as
desired.

This sign-extension method works for arbitrary bit-sizes as long
as the intermediate values fit in "long" or smaller. This means
you can store (say) 5 two's complement 5-bit fields and one 7-bit
field in a total of 32 bits, read those values back, and convert
them to signed integer values completely portably. If you need 32
or more bit values, however, you must either restrict yourself to
C99 ("long long" will handle 63 bit integers portably) or stray
outside the C standard.
 
D

Dan Pop

(snip of quotes from some version of the C standard.)


I think there was a discussion some time ago on the use of BCD
representation. Following the "as if" rule, I believe others said it was
possible. It won't be easy, for example, to make shift operators, or bit
and/or/xor work right.

It won't be possible and the as-if rule doesn't help: it is *always*
possible to examine the representation of an object and check whether
it conforms with what the implementation must document. It would be
trivial to see that the BCD representation used by the implementation
doesn't match any of the three representations allowed by the standard.

4 Values stored in non-bit-field objects of any other object type
consist of n × CHAR_BIT bits, where n is the size of an object
of that type, in bytes. The value may be copied into an object
of type unsigned char [n] (e.g., by memcpy); the resulting
set of bytes is called the object representation of the value.
Values stored in bit-fields consist of m bits, where m is the
size specified for the bit-field. The object representation is
the set of m bits the bit-field comprises in the addressable
storage unit holding it. Two values (other than NaNs) with
the same object representation compare equal, but values that
compare equal may have different object representations.

Dan
 
J

James Dow Allen

Malcolm said:
I often provide
two functions, fput16() and fget16() to write and read 16-bit integers.
fget16() sign-extends the number it reads, so will break on a non-two's
complement machine.

I think I win! :)
Properly written C code will convert shorts to longs correctly, etc.
Indeed sign extension is correct on one's-complement machines just as for
two's-complement. Some comments concerned Endianness, but that is
irrelevant to my question.

Chris posted code to read arbitrary data on arbitrary machines. Obviously
any program will have to be careful to read numbers in non-native
format but that seems beside the point for "most programs."

One "gotcha" for porting code to a one's-complement code is left shifts
on bit masks but this should work if *either*
(a) the bitmask is declared properly as unsigned, or
(b) bits shifted off to the left are always zero, which is almost always
the case for programmed bitmasks.

The phrase "almost always" may affect our confidence, but my question wasn't
whether "some" programs need fixing, but whether "most" programs need fixing.

James
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,104
Messages
2,570,643
Members
47,247
Latest member
youngcoin

Latest Threads

Top