M
Michael Henry
All,
(Note: This is a repost from comp.lang.c.moderated, in the hopes
that a wider audience might provide more information. That
thread may be found here:
http://groups.google.com/group/comp.lang.c.moderated/browse_frm/thread/4b422712201764c6
)
I've got a design for protocol processing that uses structs
containing only uint8_t and arrays of uint8_t, along with nested
structs that are similarly constructed. These structs are
overlaid onto a buffer containing a protocol header, providing
convenient access to the protocol fields using struct syntax.
Endian conversions are handled via macro accessors for
multi-byte fields. The technique applies to unions as well as
structs.
For example:
/* Represents a 16-bit Big-Endian field. */
typedef struct BigU16
{
uint8_t raw_BigU16[2];
} BigU16;
/* Overlaid onto a buffer containing a protocol header. */
typedef struct ProtocolHeader
{
uint8_t payloadType;
BigU16 payloadSize;
} ProtocolHeader;
uint8_t *buf = ...buffer of raw protocol bytes...;
ProtocolHeader *p = (ProtocolHeader *) buf;
uint8_t type = p->payloadType;
uint16_t size = GetBigU16(p->payloadSize);
Byte-sized fields may be manipulated directly (e.g., the
payloadType field above). Macros such as GetBigU16() take care
of accessing misaligned multi-byte fields and performing Endian
conversions. Because multi-byte fields are stored as structs of
arrays, it's difficult to accidentally bypass the necessary
accessor macros and touch the raw data directly. It's also
impossible to extract a field using the wrong size or
Endianness.
For this idea to work, the compiler must not introduce padding
in the struct or place additional alignment constraints on the
structs. I recognize that standard C sadly does not provide a
uniform way to request "packed" structs for this situation;
however, in many practical cases, the above design can be made
to work with real-world compilers. For example:
- A #pragma may be available to pack individual structs
(e.g., Microsoft's Visual C++ compiler family).
- A declaration modifier may be available to pack individual
structs (e.g., gcc's __attribute__((packed)) feature).
- The compiler may avoid excess alignment and padding based on
"natural" field alignment requirements without the need for
any special directives, because the structs are built only of
bytes.
Though a bit ugly, I can declare the structs as follows:
#include "pack_1.h"
typedef PACKED_STRUCT(ProtocolHeader)
{
uint8_t payloadType;
BigU16 payloadSize;
} ProtocolHeader:
#include "pack_def.h"
where the PACKED_STRUCT() macro provides a compiler-dependent
decoration such as __attribute__((packed)), and the pack_xxx.h
headers supply compiler-dependent #pragmas if needed, switching
between one-byte packing and the system default.
Ultimately, I'd like to know whether the above hacks are
sufficient to avoid problems with practical compilers I'm likely
to encounter, now and in the foreseeable future. We have a
large degree of control over the choice of compilers in my
immediate office, but I'd like our code to be usable by sister
organizations that have less control over their tools. While
it's difficult to say exactly which compilers we'll encounter, I
expect them all to be "practical" compilers meant to serve
real-world needs of customers, not the intentionally perverse
theoretical compilers meant to test the bounds of the standard
(since I know that a purely standards-conforming compiler
rejects outright the basis of my design).
My questions are:
- What existing compilers and associated environments that would
not work with the above scheme?
- Are there trends either toward or away from the current level
of support for controlling struct packing (such as future ISO
standard support for it, or a likelihood that compilers would
*stop* providing such support, or stop providing "natural"
alignment)?
- Is there another way to provide the elegance of the struct
technique while remaining truly portable?
From reading the gcc source code, I understand that Linux
internally relies on the "natural" alignment properties of
structs, so compilers for Linux probably have to support that
feature, which might be a reason to think that compilers would
continue to provide the padding-related support my design
requires.
Thanks,
Michael Henry
The above question generated the following reply from lacos in the
comp.lang.c.moderated thread:
Oracle Solaris Studio has "#pragma pack(n)" and "-xmemalign=ab":
http://docs.sun.com/app/docs/doc/821-1384/bjacr?l=en&a=view
As a matter of interest, I checked the installed documentation of an
"HP C
V7.1-015 on OpenVMS Alpha V8.3" instance, and sure enough, it says
CC
Language_topics
Preprocessor
#pragma
#pragma [no]member_alignment[_m|_nm]
Tells the compiler to align structure members on the
next
boundary appropriate to the type of the member rather
than
the next byte. For example, a long variable is aligned
on
the next longword boundary; a short variable on the next
word boundary.
Syntax:
#pragma nomember_alignment [base_alignment]
#pragma member_alignment [save | restore]
The optional base_alignment parameter can be used with
#pragma nomember_alignment to specify the base alignment
of the structure. Use one of the following keywords to
specify the base_alignment:
o BYTE (1 byte)
o WORD (2 bytes)
o LONGWORD (4 bytes)
o QUADWORD (8 bytes)
o OCTAWORD (16 bytes)
The optional save and restore keywords can be used to
save
the current state of the member_alignment and to restore
the previous state, respectively. This feature is
necessary for writing header files that require
member_alignment or nomember_alignment, or that require
inclusion in a member_alignment that is already set.
lacos
(Note: This is a repost from comp.lang.c.moderated, in the hopes
that a wider audience might provide more information. That
thread may be found here:
http://groups.google.com/group/comp.lang.c.moderated/browse_frm/thread/4b422712201764c6
)
I've got a design for protocol processing that uses structs
containing only uint8_t and arrays of uint8_t, along with nested
structs that are similarly constructed. These structs are
overlaid onto a buffer containing a protocol header, providing
convenient access to the protocol fields using struct syntax.
Endian conversions are handled via macro accessors for
multi-byte fields. The technique applies to unions as well as
structs.
For example:
/* Represents a 16-bit Big-Endian field. */
typedef struct BigU16
{
uint8_t raw_BigU16[2];
} BigU16;
/* Overlaid onto a buffer containing a protocol header. */
typedef struct ProtocolHeader
{
uint8_t payloadType;
BigU16 payloadSize;
} ProtocolHeader;
uint8_t *buf = ...buffer of raw protocol bytes...;
ProtocolHeader *p = (ProtocolHeader *) buf;
uint8_t type = p->payloadType;
uint16_t size = GetBigU16(p->payloadSize);
Byte-sized fields may be manipulated directly (e.g., the
payloadType field above). Macros such as GetBigU16() take care
of accessing misaligned multi-byte fields and performing Endian
conversions. Because multi-byte fields are stored as structs of
arrays, it's difficult to accidentally bypass the necessary
accessor macros and touch the raw data directly. It's also
impossible to extract a field using the wrong size or
Endianness.
For this idea to work, the compiler must not introduce padding
in the struct or place additional alignment constraints on the
structs. I recognize that standard C sadly does not provide a
uniform way to request "packed" structs for this situation;
however, in many practical cases, the above design can be made
to work with real-world compilers. For example:
- A #pragma may be available to pack individual structs
(e.g., Microsoft's Visual C++ compiler family).
- A declaration modifier may be available to pack individual
structs (e.g., gcc's __attribute__((packed)) feature).
- The compiler may avoid excess alignment and padding based on
"natural" field alignment requirements without the need for
any special directives, because the structs are built only of
bytes.
Though a bit ugly, I can declare the structs as follows:
#include "pack_1.h"
typedef PACKED_STRUCT(ProtocolHeader)
{
uint8_t payloadType;
BigU16 payloadSize;
} ProtocolHeader:
#include "pack_def.h"
where the PACKED_STRUCT() macro provides a compiler-dependent
decoration such as __attribute__((packed)), and the pack_xxx.h
headers supply compiler-dependent #pragmas if needed, switching
between one-byte packing and the system default.
Ultimately, I'd like to know whether the above hacks are
sufficient to avoid problems with practical compilers I'm likely
to encounter, now and in the foreseeable future. We have a
large degree of control over the choice of compilers in my
immediate office, but I'd like our code to be usable by sister
organizations that have less control over their tools. While
it's difficult to say exactly which compilers we'll encounter, I
expect them all to be "practical" compilers meant to serve
real-world needs of customers, not the intentionally perverse
theoretical compilers meant to test the bounds of the standard
(since I know that a purely standards-conforming compiler
rejects outright the basis of my design).
My questions are:
- What existing compilers and associated environments that would
not work with the above scheme?
- Are there trends either toward or away from the current level
of support for controlling struct packing (such as future ISO
standard support for it, or a likelihood that compilers would
*stop* providing such support, or stop providing "natural"
alignment)?
- Is there another way to provide the elegance of the struct
technique while remaining truly portable?
From reading the gcc source code, I understand that Linux
internally relies on the "natural" alignment properties of
structs, so compilers for Linux probably have to support that
feature, which might be a reason to think that compilers would
continue to provide the padding-related support my design
requires.
Thanks,
Michael Henry
The above question generated the following reply from lacos in the
comp.lang.c.moderated thread:
- A #pragma may be available to pack individual structs (e.g.,
Microsoft's Visual C++ compiler family).
Oracle Solaris Studio has "#pragma pack(n)" and "-xmemalign=ab":
http://docs.sun.com/app/docs/doc/821-1384/bjacr?l=en&a=view
As a matter of interest, I checked the installed documentation of an
"HP C
V7.1-015 on OpenVMS Alpha V8.3" instance, and sure enough, it says
CC
Language_topics
Preprocessor
#pragma
#pragma [no]member_alignment[_m|_nm]
Tells the compiler to align structure members on the
next
boundary appropriate to the type of the member rather
than
the next byte. For example, a long variable is aligned
on
the next longword boundary; a short variable on the next
word boundary.
Syntax:
#pragma nomember_alignment [base_alignment]
#pragma member_alignment [save | restore]
The optional base_alignment parameter can be used with
#pragma nomember_alignment to specify the base alignment
of the structure. Use one of the following keywords to
specify the base_alignment:
o BYTE (1 byte)
o WORD (2 bytes)
o LONGWORD (4 bytes)
o QUADWORD (8 bytes)
o OCTAWORD (16 bytes)
The optional save and restore keywords can be used to
save
the current state of the member_alignment and to restore
the previous state, respectively. This feature is
necessary for writing header files that require
member_alignment or nomember_alignment, or that require
inclusion in a member_alignment that is already set.
lacos