C
Chris Torek
candide said:
The meaning is fuzzy, because -- even after correcting this to "the
value of an array" (or "the result of the `decay' of an array" or
however you want to express the idea of computing &array[0]) --
the types differ.
What about it? Consider:
#include <stdio.h>
int main(void) {
int equal;
equal = (int)3.1 == (int)3.2;
printf("%d %s %d\n", (int)3.1, equal ? "==" : "!=", (int)3.2);
return 0;
}
When compiled and run, this prints "3 == 3". And of course, three
*is* equal to three. That means 3.1 must be equal to 3.2, right?
The fundamental problem here is that we are comparing values of
different types, and to do that, we have to start by picking some
"common" type, convert both values to that common type, and then
compare the result. Each conversion *could* -- in this case, does
-- change the value in some subtle or not-so-subtle way. (On a
Lisp machine, changes may be fairly subtle, as they can involve
changing the type-tag bits without changing the rest of the
value-representation bits.)
When we compare the modified values, we can tell whether the modified
values are equal. Alas, we cannot tell whether this is because
the modification changed the values in the process. The only way
to find out is to observe the modification, which Standard C tells
us not to do. (We can do it anyway, by stepping outside the bounds
of Standard C, but then all we can say for certain is that *this*
implementation does whatever we just observed. Another implementation,
or even this one after it is modified someday in the future, need
not do the same thing.)
Sometimes, knowing what "this implementation" does is sufficient.
Quite often, knowing what "this implementation, that implementation,
and those 27 other implementations" do is sufficient. But none of
them tell us what *every* implementation, past or future, will
*have* to do. The "every implementation, past or future" is the
domain of the Standard (though even then, the Standard may do a
lousy job of that, since a future Standard could change things).
You will have to ask the people who wrote the text of the Standard,
but I think this is a matter of slightly-sloppy phrasing.
We *do* know (from the Standard) that <type,value> pairs have
"representations": bit-patterns in what are, or at least look like,
binary, stored in memory, that we can take apart one "unsigned
char" at a time by doing:
T some_variable = some_value;
unsigned char *cp = &some_variable;
char *sep = "";
for (size_t i = 0; i < sizeof some_variable; i++) {
printf("%s%#x", sep, cp);
sep = " ";
}
putchar('\n');
which is likely to print things like:
0xc 0x40 0 0x01
for some value(s) of some type(s). The Standard does not promise
that there is a single unique representation for every value
(floating-point values in particular may see situations where
several different representations all mean the same "value",
especially for zero), but the phrase you have quoted above
tells us that for *integral* types (char, short, int, long,
long long, and their signed and unsigned variants), *nonnegative*
values in a signed type have a single representation, and it is
the same representation as the same value of an unsigned type.
More specifically, then:
#include <limits.h>
#include <stdio.h>
void print_the_representation(unsigned char *cp, size_t size, char *s) {
size_t i;
char *sep = "";
/* assumes CHAR_BIT is 8 (if not, still works but output is ugly) */
for (i = 0; i < size; i++)
printf("%s%2.2x", sep, cp);
printf("%s", s);
}
int main(void) {
short s;
unsigned short us;
for (s = 0; s < SHRT_MAX {
s++; /* so that range is [1..SHRT_MAX] inclusive */
print_the_representation((unsigned char *)&s, sizeof s, ", ");
us = s;
print_the_representation((unsigned char *)&us, sizeof us, "\n");
}
return 0;
}
Compile and run this program, and observe that the output is pairs
of comma-separated words, where each word is a hexadecimal expression
of the representations of "short" and "unsigned short" values in
the range [1..SHRT_MAX]. The Standard tells us that each word in
each such pair of words is identical.
The meaning is fuzzy, because -- even after correcting this to "the
value of an array" (or "the result of the `decay' of an array" or
however you want to express the idea of computing &array[0]) --
the types differ.
What about (void *) &array == (void *) &array[0]?
What about it? Consider:
#include <stdio.h>
int main(void) {
int equal;
equal = (int)3.1 == (int)3.2;
printf("%d %s %d\n", (int)3.1, equal ? "==" : "!=", (int)3.2);
return 0;
}
When compiled and run, this prints "3 == 3". And of course, three
*is* equal to three. That means 3.1 must be equal to 3.2, right?
The fundamental problem here is that we are comparing values of
different types, and to do that, we have to start by picking some
"common" type, convert both values to that common type, and then
compare the result. Each conversion *could* -- in this case, does
-- change the value in some subtle or not-so-subtle way. (On a
Lisp machine, changes may be fairly subtle, as they can involve
changing the type-tag bits without changing the rest of the
value-representation bits.)
When we compare the modified values, we can tell whether the modified
values are equal. Alas, we cannot tell whether this is because
the modification changed the values in the process. The only way
to find out is to observe the modification, which Standard C tells
us not to do. (We can do it anyway, by stepping outside the bounds
of Standard C, but then all we can say for certain is that *this*
implementation does whatever we just observed. Another implementation,
or even this one after it is modified someday in the future, need
not do the same thing.)
Sometimes, knowing what "this implementation" does is sufficient.
Quite often, knowing what "this implementation, that implementation,
and those 27 other implementations" do is sufficient. But none of
them tell us what *every* implementation, past or future, will
*have* to do. The "every implementation, past or future" is the
domain of the Standard (though even then, the Standard may do a
lousy job of that, since a future Standard could change things).
Can you then tell me what the standard means when
it says...
"The range of nonnegative values of a signed
integer type is a subrange of the corresponding
unsigned integer type, and the representation
of the same value in each type is the same."
^^^^^^^^^^
Either 'same value' is independant of type, or
above line makes no sense. Which is it?
You will have to ask the people who wrote the text of the Standard,
but I think this is a matter of slightly-sloppy phrasing.
We *do* know (from the Standard) that <type,value> pairs have
"representations": bit-patterns in what are, or at least look like,
binary, stored in memory, that we can take apart one "unsigned
char" at a time by doing:
T some_variable = some_value;
unsigned char *cp = &some_variable;
char *sep = "";
for (size_t i = 0; i < sizeof some_variable; i++) {
printf("%s%#x", sep, cp);
sep = " ";
}
putchar('\n');
which is likely to print things like:
0xc 0x40 0 0x01
for some value(s) of some type(s). The Standard does not promise
that there is a single unique representation for every value
(floating-point values in particular may see situations where
several different representations all mean the same "value",
especially for zero), but the phrase you have quoted above
tells us that for *integral* types (char, short, int, long,
long long, and their signed and unsigned variants), *nonnegative*
values in a signed type have a single representation, and it is
the same representation as the same value of an unsigned type.
More specifically, then:
#include <limits.h>
#include <stdio.h>
void print_the_representation(unsigned char *cp, size_t size, char *s) {
size_t i;
char *sep = "";
/* assumes CHAR_BIT is 8 (if not, still works but output is ugly) */
for (i = 0; i < size; i++)
printf("%s%2.2x", sep, cp);
printf("%s", s);
}
int main(void) {
short s;
unsigned short us;
for (s = 0; s < SHRT_MAX {
s++; /* so that range is [1..SHRT_MAX] inclusive */
print_the_representation((unsigned char *)&s, sizeof s, ", ");
us = s;
print_the_representation((unsigned char *)&us, sizeof us, "\n");
}
return 0;
}
Compile and run this program, and observe that the output is pairs
of comma-separated words, where each word is a hexadecimal expression
of the representations of "short" and "unsigned short" values in
the range [1..SHRT_MAX]. The Standard tells us that each word in
each such pair of words is identical.