(Actually #3 is an overstatement: they can have different sizes as
long as the "bigger" pointers can be stored in the "smaller" ones
temporarily without loss of information when converted back to the
"bigger" ones.)
Hmm, a lot of code I've seen make the assumption that 1==2.
I assume "1==2" is shorthand for some kind of reference to items 1
and 2 in Jack Klein's list, although I am not quite sure what. Still:
Especially TCP/IP libraries in Linux, Windows, BSD etc. that casts
a char* into structs to extract data from a packet.
It is worth noting two things about the BSD code and everything
derived from it (whether directly, as in the case of a number of
other TCP/IP implementations, or indirectly, as in the case of,
well, almost every remaining TCP/IP implementation
).
First, most of the code was written in the early 1980s, before
there was any C standard at all. So the overall design does not
conform well to the standards, for historical reasons. There have
been some attempts to bring the code into the 1990s (including some
on my part, back *in* the 1990s), but with limited success.
Second, the code is in fact *not* portable. Some of this was done
deliberately: the use of ntohl() and htonl(), for instance, is
often turned into a no-op on big-endian machines in the name of
performance. This is important because the VAX has a one megahertz
CPU, and thus no computer is faster than about 1 MHz.
(In other words, a lot of portability was sacrificed to the Little
Tin God of Efficiency. This may well have been the right decision
in 1983. It was, I think, wrong by 2003, if not earlier.)
If 1!=2, then you're saying that a void pointer should not ever point
to a struct. But this is how polymorphism is usually emulated in C.
Why, even malloc( ) returns a void pointer which can be used to point
to an area of memory to store a struct.
This is OK! It may help to use a little "ESP": Exaggeration of System
Parameters. Imagine for a moment that "void *" (and thus "char *" as
well, due to the "same representation" requirement for void* and char*)
each occupy one megabyte of data, while "struct S *" (for any S) uses
four bytes.
Will a four-byte "struct S *" always fit in a one-megabyte buffer?
(Sure!)
Will a one-megabyte "char *" or "void *" fit in a "struct S *"?
(Nope.) But if the four "important" bytes match up, that limited
subset of assignments *does* work:
struct S *orig;
void *tmp;
struct S *new;
orig = <some expression>;
tmp = orig; /* always works: copies 4 bytes into 1-MB buffer */
new = tmp; /* this works too: it copies the 4 "important" bytes back */
To make malloc() work, we just have to make sure that most of the
one-megabyte value is "unused", so that all the "good parts" fit:
tmp = malloc(sizeof(struct S)); /* sets all 1-MB bytes */
orig = tmp; /* copies 4 "important" bytes; works only if
the rest of the bytes really were unimportant */
Besides, aren't pointers just addresses into memory?
Pointers are typed, and have sizes. "Addresses" may or may not
have some other type and size.
Why would a CPU differentiate between structured and unstructured
bytes?
Consider the Cray or the Data General Eclipse (MV/10000). These
machines do not have 8-bit bytes as a "machine level" entity (with
a caveat for the Eclipse); instead, their CPUs address "words" --
64-bit words on the Cray, 16-bit words on the Eclipse. Word 0
contains 8 8-bit bytes (on the Cray) or 2 8-bit bytes (on the
Eclipse).
Most pointer types -- "int *", "struct S *", "double *", and so
on -- point to machine words. They need only the regular old
"machine word pointer" value that the hardware handles directly.
This points to 64 bits (on the Cray) or 16 bits (on the Eclipse).
The compiler-writers could have chosen to make CHAR_BIT 64 (on the
Cray) or 16 (on the Eclipse; the Eclipse has some extra features
that make this a particularly bad idea) -- but they chose instead
to make CHAR_BIT be 8. To do this, they had to make "char *"
(and thus "void *") "smuggle in" some extra bits, above and beyond
the "machine word" address.
On the Cray, with at least some C compilers, the extra bits are
stored near the high end of the machine address word. That is,
machine addresses 0, 1, 2, ... are represented as "byte address"
0, 1, 2, ..., so in order to identify "byte 3 of machine address
0", the needed three bits -- the ones that tell which 8-bit group
is to be used out of the 64-bit word -- are near the high end
of the address. Thus, bytes are addressed as: 0x000...0, 0x100...0,
0x200...0, 0x300...0, 0x400...0, 0x500...0, 0x600...0, 0x700...0,
0x000...1, 0x100...1, 0x200...1, and so on.
On the Eclipse, the hardware actually has support for "byte pointers".
The CPU has special instructions for converting "word pointers" to
"byte pointers" and vice versa. This "special instruction" is
actually just a one-bit shift, discarding the "I" -- indirect --
bit from the word address and introducing a byte-offset bit. So
on the Eclipse:
int *ip;
char *cp;
...
cp = malloc(N * sizeof(int));
ip = (int *)cp;
compiles to the same machine code as:
uint32_t ip, cp;
...
cp = malloc_equivalent(N * sizeof(int));
ip = cp >> 1;
Since malloc() actually returns "void *", malloc() on the Eclipse must
always return a "word-aligned" byte-pointer, whose low bit is 0, so
that "ip = cp >> 1" always discards a zero bit. That way, a later
call like:
cp = ip << 1;
free_equivalent(cp);
passes the correct number -- when ip is converted from a word
pointer to a byte pointer, the result is always word-aligned.
The old PR1ME (which may never have had an ANSI C compiler) had
32-bit "int *" machine-level pointers, and 48-bit "char *" pointers,
as all 32 bits were in use and -- as with the Cray and Eclipse --
the compiler-writers chose to make CHAR_BIT 8 anyway. This meant
the compiler needed extra bits in a "char *", in order to pick out
which byte you wanted, and that required an extra 16 bits.