... I thought its [much] the same with structures. Their
elements are addressed by adding whatever is needed to the pointer
which points to its first element. Of course, because a structure
doesn't consist of equally large members - as it is with arrays, it
matters which element you want to acces. But the idea is the same. Is
this really totally wrong?
It is not *completely* wrong, but it begins with a wrong assumption:
that there is a pointer to the structure in the first place.
Consider the following "struct"s, and an 80x86-CPU implementation:
struct S1 { int s1_i; };
struct S2 { short s2_s1, s2_s2; };
If we write a function that uses two such objects:
void f(void) {
struct S1 a;
struct S2 b;
a.s1_i = 42;
b.s2_s1 = 12;
b.s2_s2 = 0;
... more code that happens not to use b.s2_s2 ...
}
how surprised would you be to find that a compiler put "a" in a
CPU register like %edx? Since "a.s1_i" is a single 32-bit value,
it fits just fine in that register.
Would you be further surprised if the compiler put "b" in another
CPU register? While it contains two "short"s, if b were in, say,
%ecx, the code that uses b.s2_s1 could use the %cx "register half"
on the CPU to access the desired sub-half. (It is important that
this code not use b.s2_s2 because that would, on this compiler,
reside in the upper half of %ecx, where it is difficult to extract
or insert without disturbing the %cx half.)
More generally, structure members must be "remembered" in a C
compiler (at compile-time) as a pair of values: an offset, and a
type. The type is the same as that used for any ordinary variable
-- e.g., if the compiler happens to use numbers internally, and
uses "7" for "int", the member s1_i in "struct S1" might be
represented as the pair <0,7>. If "short" is type-code 5, the two
members in struct S2 might be <0,5> and <2,5> respectively.
Before the kind of optimization that might put struct values into
CPU registers, then, the compiler does indeed use the kind of
address-offsetting you are considering. If one has a structure
object in memory, one need only use the offset:
mov.l #42,mem_for_a+0 // a.s1_i = 42
mov.w #12,mem_for_b+0 // b.s1_s1 = 12
mov.w #0,mem_for_b+2 // b.s1_s2 = 0
while if one has a pointer, one must follow the pointer and add
the offset:
mov.l #42,0(a_ptr) // a->s1_i = 42
mov.w #12,0(b_ptr) // b->s1_s1 = 12
mov.w #0,2(b_ptr) // b->s1_s2 = 0
(all in some hypothetical assembly language).
The C standard requires that structure member offsets be allocated
in ascending order, and that the first one be zero; but between
any two structure members, there may be an arbitrary amount of
padding. Typically, C compilers use this "padding license" to make
sure that structure members are aligned on "natural" boundaries
for the underlying machine -- e.g., a struct containing a char
followed by an int may have (sizeof(int)-1) bytes of padding between
them. Padding may also be added at the end of a structure, so that
arrays of structures are similarly aligned. (Bitfields are a
special case that is somewhat difficult to describe, and two
compilers for the same machine may well use different bitfield
allocation orders. C compilers targeting the Motorola 680x0 CPUs
tended to do this for reasons that are more obvious if you were
writing 68010 assembly code when the 68020 first came out.)
Very early C compilers actually permitted almost any lvalue on the
left of a "." operator. I once rewrote a program that contained
code like this:
struct { char lo, hi; };
short w;
w = somefunc();
printf("the bytes in octal are: %o, %o\n", w.lo, w.hi); /* GAK */
Even in the late 1980s, this code still compiled (with very loud
warnings) on the VAX "portable C compiler" (PCC). This was already
dodgy, and was not going to be part of ANSI C (this was part of the
reason to rewrite the program) -- but it shows how the concept
evolved, at least.