It depends to some extent on which standard you're looking at: C99
provides significantly more guarantees than C90 did.
I'll try to restore the context here, because when I read what Eric wrote,
I had to stop and think bit too -- I was having C89 in mind:
Another is to put the buffer and its "overlay" into a union, which is a
sanctioned way of telling C you intend to access it via multiple type
aliases. In this case, it's probably better to make the two-byte items
`unsigned short' (unless you like negative lengths ...):
union {
unsigned char buffer[SIZE];
unsigned short words[2];
} both;
...
length = ntohs(both.words[0]);
type = ntohs(both.words[1]);
The idea presumably being, one fills in "both.buffer" first, in network
byte order, then accesses those parts of storage through "both.words".
I didn't have doubts about "buffer" and "words" starting at the same
correctly aligned address, but I did doubt whether the technique would be
allowed "everywhere" (especially wrt. SUSv[12], which are based on C89).
IMVHO Eric is right, and I thank him for the idea (I used memcpy() before
for such purposes); C89 6.3.2.3 "Structure and union members" says:
[...] With one exception, if a member of a union object is accessed after
a value has been stored in different member of the object, the behavior is
implementation-defined. ^{41} One special guarantee is made in order to
simplify the use of unions: If a union contains several structures that
share a common initial sequence (see below), and if the union object
currently contains one of these structures, it is permitted to inspect the
common initial part of any of them. [...]"
The relevant part here is exactly *not* the special exception of the
common initial sequence, but the *implementation-defined* nature of the
access pattern in question. That is, if one can ensure that on a given
implementation the access' result won't lead later to a trap
representation or some other undefined behavior, everything's OK.
Footnote 41: "The `byte orders' for scalar types are invisible to isolated
programs that do not indulge in type punning (for example, by assigning to
one member of a union and inspecting the storage by accessing another
member that is an appropriately sized array of character type), but must
be accounted for when conforming to externally imposed storage layouts".
The parts that are important to me now are: "The `byte orders' for scalar
types [...] must be accounted for when conforming to externally imposed
storage layouts".
Thus the normative part calls the access implementation-dependent, and the
informative part actually endorses caring about the byte order when doing
network IO.
[going OT] If only we had some C89-based standard, offering a way to make
the further result of that implementation-dependent access safe for some
bigger types, on all implementations conforming to that standard!
Oh wait, we have. I'll modify the code a bit:
For SUSv1:
#define _XOPEN_SOURCE /* SUSv1 */
#define _XOPEN_SOURCE_EXTENDED 1 /* X/Open UNIX Extension */
#include <limits.h> /* CHAR_BIT */
#include <arpa/inet.h> /* in_port_t, ntohs() */
#if CHAR_BIT != 8
# error "CHAR_BIT != 8, sorry"
#endif
union {
in_port_t words[2];
char unsigned buffer[4];
} both;
length = ntohs(both.words[0]);
type = ntohs(both.words[1]);
"in_port_t" is defined in <netinet/in.h>, and also made visible by
<arpa/inet.h>:
----v----
The <netinet/in.h> header defines the following types through typedef:
in_port_t An unsigned integral type of exactly 16 bits.
----^----
And
----v----
htonl, htons, ntohl, ntohs -- convert values between host and network byte
order
[...]
in_port_t ntohs(in_port_t netshort);
[...]
These functions convert 16-bit and 32-bit quantities between network byte
order and host byte order.
[...]
----^----
That is, "in_port_t" ensures that no trap representation is possible, and
ntohs() covers the implementation-dependent part.
For SUSv2:
#define _XOPEN_SOURCE 500 /* SUSv2 */
#include <arpa/inet.h> /* uint16_t, ntohs() */
union {
uint16_t words[2];
char unsigned buffer[4];
/* might as well use uint8_t */
} both;
length = ntohs(both.words[0]);
type = ntohs(both.words[1]);
I removed the CHAR_BIT check, because CHAR_BIT must be at least 8
(CHAR_BIT >= 8), and uint8_t is a required type (which can't be smaller
than a char, 8 >= CHAR_BIT).
Sorry for this jumble again
Cheers,
lacos