I have heard it said, but not confirmed, that the only guarantee that
the standard gives with regards to structs is that the first element is
aligned with the structures first byte and that the order of the members
will not be changed.
That's it, mostly. We know that members are properly aligned
for their types and there's some special language pertaining to
bit-fields, but you're essentially correct.
Does that mean that code like that below should
print "Hello" but after that anything would be possible?
Hello, World
struct words: 14
char[] str: 13
/***********************************************/
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
struct words {
char hello[5];
char comma;
char space;
char world[5];
char exclaim;
char term;
};
We know that the "hello" member begins at the struct's first
byte, and that the later members appear in order, not overlapping:
offsetof(struct words, hello) == 0
offsetof(struct words, comma) >= 0 + 5
offsetof(struct words, space) >=
offsetof(struct words, comma) + 1
offsetof(struct words, world) >=
offsetof(struct words, space) + 1
offsetof(struct words, exclaim) >=
offsetof(struct words, world) + 5
offsetof(struct words, term) >=
offsetof(struct words, exclaim) + 1
Finally, we know that the struct it at least as large as the
sum of its element sizes and any padding between them:
sizeof(struct words) >= offsetof(struct words, term) + 1
.... hence sizeof(struct words) >= 14 (== 5 + 1 + 1 + 5 + 1 + 1).
Since none of the members requires any special alignment, it's
quite likely that sizeof(struct words) will in fact be 14 exactly.
Perhaps the next most likely value is 16, if a compiler decides to
put two padding bytes at the end to make the whole thing fit in two
8-byte units. Descending even further on the likelihood scale, a
compiler might insert one padding byte before `world' and one more
at the end, so each array would be contained in a single 8-byte
unit. Other padding arrangements seem extremely unlikely -- though
as you observe, they're permitted.
int main (int argc, char* argv[]) {
char str[] = "Hello, World";
struct words w;
Okay, `w' occupies >=14 bytes of storage.
memcpy (&w, str, sizeof (str));
This fills the first 13 bytes of `w' with a copy of the string.
The 14th byte (and any others) remain uninitialized. Since `w'
has sufficient space for everything that's being copied into it,
there's no problem up to this point.
Note that memcpy() makes no use of the "struct-ness" of
the target. In C, any addressable object can be viewed as an
array of bytes, without regard to the object's actual type.
That's what memcpy() does: It just copies bytes, and doesn't
care what type the bytes represent.
char *cp = (char*) &w;
while (*cp != '\0') {
printf ("%c", *cp);
++cp;
}
Here, you're doing much the same thing as memcpy() did: You
are not using `w' as a struct, but only as a bag of bytes. If
there are padding bytes, you're using them on exactly the same
basis as you use member bytes: They're all just bytes. The
output *will* be "Hello, World" whether there's padding or not.
Using the "struct-ness" might (in principle) have produced
some surprises:
printf("%.5s", w.hello); // fine so far
printf("%c", w.comma); // BZZT!
printf("%c", w.space); // BZZT!
printf("%.5s", w.world); // BZZT!
There's no telling (in principle) what the final three lines
would have done.
printf ("\n");
printf ("struct words: %d\nchar[] str: %d\n",
sizeof (w), sizeof (str));
Nit-pick: "%d" is for signed integers, which `size_t' is
not. I've used systems where this would have printed the two
sizes as 14 and 0 thanks to the mismatch; in principle, worse
things could happen.