I see. What it really comes down to is that C pointers
are/have evolved. The pointers `void *' and `char *'
should be thought as unique and new C types having very
special qualities beyond pointer to type.
Actually, it is more that "char" itself is special, and as a result,
so is "char *". That "specialness" need not propagate any further,
but of course "void *" and "char *" are really "the same thing"
underneath -- in particular, they have the same representation.
I saved this:
In article said:
>As far as I understand it now, thanks to the excellent explanations
>from Chris, the C system of object storage can be expressed in three
>layers.
>
> Layer 3: Interpretation - Used values
> Layer 2: Representation - Pure binary, unsigned char
> Layer 1: Storage - Hardware
The "hardware" layer (Layer 1) is simply dictated by the hardware.
Someone built a machine, and it does whatever it does. The "Layer
2" part is part of Standard C, and imposes special features on
(unsigned) char, and therefore pointers-to-unsigned-char: these
things get at a "pure binary" set of bits that are used to represent
every object in C. More precisely, they represent addressable
objects -- ones where &x works -- as thing stored in registers can
"cheat".
The topmost Layer 3 items normally go right back to whatever the
hardware provides: if an "int" uses 42 bits scattered across 8
9-bit "C bytes", and ignores the remaining 30 bits entirely or uses
them as a checksum of the other 42 or whatever, this is not a
problem. Here sizeof(int) is 8 and CHAR_BIT is 9, and 8 x 9 = 72,
and using "unsigned char *" you can access all 72 bits, but regular
old "int" only uses 42 of them to determine the int's value.
So now, proceeding on to one of these word-addressed machines:
typedef struct {
char ch;
} ch_t;
ch_t *a_ch;
On an architecture with 48-bit "char *" pointers
and 32-bit pointers for other data types, what would
sizeof (a_ch)
sizeof (int *)
be (I'm not trying to be funny)?
Here the fundamental "word size" of the machine is 16 bits, and
regular "machine level" pointers -- "int *" for instance -- use
two machine-words to point to a third machine word. (I am assuming
16-bit "int"s here, and that a 32-bit long is an integral type that
happens to be the right size to hold a machine-level pointer, on
this machine.)
The machine's word size is 16 bits, but the implementor chose to
provide 8-bit "unsigned char"s. Hence, a "char *" requires a pair
of entities: <32-bit machine-level pointer, 16-bit auxiliary word
to select even/odd byte>.
The unnamed struct above has an 8-bit "char" and 8 bits of padding,
so that sizeof(struct unnamed) -- which is of course the same as
sizeof(ch_t) -- is 2.
Since a "ch_t" always occupies an entire machine word, only one
32-bit machine-level pointer is required to address it. This
means that "sizeof(a_ch) -- or sizeof(struct unnamed *) -- is
4 (because CHAR_BIT is 8 and machine-level pointers use 2 16-bit
machine-level words). Likewise, an "int" always occupies a complete
16-bit machine-level word, and "sizeof(int *)" is also 4. But
sizeof(char *), sizeof(unsigned char *), and sizeof(void *) are
all 6 -- they use two machine-level words to hold a machine-level
pointer, plus a third machine-level word to hold a single bit
to select "even/odd byte within word".
Note that the malloc() function must locate a contiguous chunk of
16-bit machine-level words, giving a 32-bit value that points to
whole words. It must then convert its return value from "machine-level
pointer" to "C-level byte-pointer", by tacking on a third 16-bit
word saying "use the first C-byte of this word". Conversions
from "char *" to "T *", for "normal" machine-level types and for
structure types, simply discard the byte-offset value, and
conversions from "T *" to "char *" always add a zero byte-offset.
(And, of course, "void *" and "char *" are really "the same"
underneath, so T here never stands for "void".)
These kinds of machines are rare (perhaps even nonexistent) today.
They were pretty much wiped out by the 8-bit-byte "killer
microprocessors", which ate the traditional "minicomputer" market
entirely, and have taken a bite out of of the traditional mainframe
markets. The IBM AS/400 architecture still preserves some of this
sort of ornamentation, but the AS/400 is, as far as I know, a
byte-addressed (virtual) machine.