Les Cargill said:
I am not sure I follow your reasoning here, but that's okay. I mean
by ambiguity "is part of a non-one-to-one-and-onto map from strings
in source code to binary values in memory."
But expressions map to binary values in memory only when they're *used*,
and the way they're used can matter.
If we use those three r-values - -2147483648, 2147483648 and 0x80000000
in 'C' source code, then use objdump to view how code was generated for
them, we'd see the same representation ( assuming certain compilers,
machine word sizes, yadda yadda).
If I may, you seem to be holding that the mapping is much more
inscrutable than I think it is. It was pretty clear to me
that the OP was on a 32 bit machine that worked like GNU
on this subject.
I've been assuming (explicitly enough, I hope) the same thing: 32-bit
int and long, 2's-complement, and either 64-bit long long or no long
long at all for C90.
I'll use hexadecimal in square brackets to denote bits in memory, to
avoid confusion with C syntax.
Even a simple constant like 7 can, depending on the context in
which it's used, result in a number of different in-memory bit
patterns: [07], [0007], [00000007], [0000000000000007]. Given the
assumptions we're making, it's always of type int and therefore
always [00000007], but any of the others can show up (with trivial
optimization) if you use 7 to initialize an object of type char,
short, int, or long long, respectively.
[...]
The point I was trying to make is that hex notation is a strategy to
use when the other conventions break down. But as you note,
"you gotta know the territory."
Agreed -- except that hex notation doesn't solve as many problems
as you might expect. It's easy to think of hexadecimal notation as
denoting an actual bit-level representation, but in fact it merely
represents C values of C types, and the rules are as complex as
(and different from) the rules for decimal notation.
I am thinking that hex notation fully specifies values, while decimal
has one more degree of freedom. There is still the problem of sign
extension, but since we happen to be "int is 32 bits, so is long",
for *that* bit pattern, it washes out.
I suggest that this is where you're going a bit astray.
C hexadecimal constants are no more or less ambiguous than C
decimal constants. They do map more directly to bit patterns,
but a hex literal can be of *more* different types than a decimal
literal can (unsuffixed decimal literals, as of C99, are always of
some signed type). But now that I think about it, the rules for
hex literals can lead to fewer surprising results.
An example: This program:
#include <stdio.h>
int main(void) {
printf("sizeof 0x80000000 = %d\n", (int)sizeof 0x80000000);
printf("sizeof 2147483647 = %d\n", (int)sizeof 2147483647);
printf("sizeof 2147483648 = %d\n", (int)sizeof 2147483648);
return 0;
}
produces this output in C90:
sizeof 0x80000000 = 4
sizeof 2147483647 = 4
sizeof 2147483648 = 4
and this output in C99:
sizeof 0x80000000 = 4
sizeof 2147483647 = 4
sizeof 2147483648 = 8
Hex constants are more likely to behave the way most people expect,
resulting in a stored representation whose bits correspond directly to
the hex digits. On the other hand, if you're trying to initialize an
int object to the value -2,147,483,648 with the representation
[f0000000], then this:
int x = 0xf0000000;
*probably* works, but if you think of a hexadecimal constant as a
direct portrayal of a bit pattern (rather than of a *value* of a
specific *type*), then it's easy to miss the fact that the implicit
unsigned-to-signed conversion has an implementation-defined result.
(And in case anyone was wondering, adding a cast to int doesn't help.)
Regardless of that, thanks for the comments. It's easy to get into
a rut on these things and use shortcuts that are not always correct.
After a couple of decades of 32 bit targets, you get used to certain
things...
Indeed. "All the world's a VAX^H^H^H x86."