struct mapping problem (gcc on Linux)

B

barcaroller

I have a block of memory that looks as follows:

uint8 value1 16
uint16 value2 0301
uint16 value3 008c


However, when I use a C struct to map this block of memory, like this:

typedef struct
{
uint8 value1;
uint16 value2;
uint16 value3;
} mytype;

mytype p* = (mytpye*) block_of_memory


I get the following results:

sizeof(mytype) is being reported as 6, and not 5

value1 shows as 0x16 (correctly)
value2 shows as 0x01 (not 0x0301)
value3 shows as 0x018c (not 0x008c;
the 01 is the out-of bounds sixth byte)


Obviously, the compiler has its own idea of how to map this struct but I've
seen this kind of mapping all over the Linux header files (e.g. TCP and IP
header files). I've used similar mappings in my own code in the past, and I
*think* it always worked. What am I missing? And how should I go about
fixing this?


Side note
=========
If I change the struct to break value2 into two halves, like this:

typedef struct
{
uint8 value1;
uint8 value2_1;
uint8 value2_2;
uint16 value3;
} mytype;

I get the following result:

sizeof(mytype) still being reported as 6, and not 5

value1 shows as 0x16 (correctly)
value2_1 shows as 0x03 (correctly)
value2_2 shows as 0x01 (correctly)
value3 shows as 0x018c (not 0x008c,
the 01 is the out-of bounds sixth byte)
 
J

jameskuyper

barcaroller said:
I have a block of memory that looks as follows:

uint8 value1 16
uint16 value2 0301
uint16 value3 008c


However, when I use a C struct to map this block of memory, like this:

typedef struct
{
uint8 value1;
uint16 value2;
uint16 value3;
} mytype;

mytype p* = (mytpye*) block_of_memory


I get the following results:

sizeof(mytype) is being reported as 6, and not 5

value1 shows as 0x16 (correctly)
value2 shows as 0x01 (not 0x0301)
value3 shows as 0x018c (not 0x008c;
the 01 is the out-of bounds sixth byte)


Obviously, the compiler has its own idea of how to map this struct but I've
seen this kind of mapping all over the Linux header files (e.g. TCP and IP
header files). I've used similar mappings in my own code in the past, and I
*think* it always worked. What am I missing? And how should I go about
fixing this?

Use of structures for this purpose is highly non-portable, for many
reasons, the most important of which is that the standard gives
implementations permission to insert padding between and after members
of a struct (but not before the first member). Note: it would be even
less portable if bit-fields were involved.

To extract the data you're looking for, you need to memcpy() it from
the buffer into a variable of the appropriate type.
 
B

barcaroller

jameskuyper said:
Use of structures for this purpose is highly non-portable, for many
reasons, the most important of which is that the standard gives
implementations permission to insert padding between and after members
of a struct (but not before the first member). Note: it would be even
less portable if bit-fields were involved.

To extract the data you're looking for, you need to memcpy() it from
the buffer into a variable of the appropriate type.

I agree but a lot of the Linux source code is based on exactly such
constructs (with bit-fields too). For example, it would be too expensive to
memcpy() the fields of a packet buffer that is being read via libpcap in
real-time. Linux just maps the structs. I suspect that there are some
obscure rules regarding mapping structs to memory, that I'm not aware of.
 
B

bartc

barcaroller said:
I agree but a lot of the Linux source code is based on exactly such
constructs (with bit-fields too). For example, it would be too expensive
to memcpy() the fields of a packet buffer that is being read via libpcap
in real-time. Linux just maps the structs. I suspect that there are some
obscure rules regarding mapping structs to memory, that I'm not aware of.

Something like #pragma pack(1) or equivalent I'd guess needs to be in effect
so that no padding is inserted in the struct:
typedef struct
{
uint8 value1;
uint16 value2;
uint16 value3;
} mytype;

(Total size 5 bytes instead of 6)

So value1 occupies 1 byte and value2 follows immediately. It's likely to be
misaligned however which may cause problems, unless the whole thing starts
at an odd address.
 
B

Beej Jorgensen

barcaroller said:
I agree but a lot of the Linux source code is based on exactly such
constructs (with bit-fields too).

Which examples, out of curiosity?

With gcc, you can force structure packing a few ways, including the
-fpack-struct command line switch, #pragma pack, and
__attribute__((packed)).

-Beej says without ever having tried any of it
 
B

Ben Pfaff

Beej Jorgensen said:
Which examples, out of curiosity?

linux/if_ether.h uses __attribute__((packed)):

struct ethhdr {
unsigned char h_dest[ETH_ALEN]; /* destination eth addr */
unsigned char h_source[ETH_ALEN]; /* source ether addr */
__be16 h_proto; /* packet type ID field */
} __attribute__((packed));

linux/ip.h doesn't, because everything gets packed the same way
on the architectures that Linux support. Note the use of #if's
to deal with the different endiannesses that Linux supports:

struct iphdr {
#if defined(__LITTLE_ENDIAN_BITFIELD)
__u8 ihl:4,
version:4;
#elif defined (__BIG_ENDIAN_BITFIELD)
__u8 version:4,
ihl:4;
#else
#error "Please fix <asm/byteorder.h>"
#endif
__u8 tos;
__be16 tot_len;
__be16 id;
__be16 frag_off;
__u8 ttl;
__u8 protocol;
__sum16 check;
__be32 saddr;
__be32 daddr;
/*The options start here. */
};

Other examples are easy to find.
 
N

Nobody

I agree but a lot of the Linux source code is based on exactly such
constructs (with bit-fields too). For example, it would be too expensive
to memcpy() the fields of a packet buffer that is being read via libpcap
in real-time. Linux just maps the structs. I suspect that there are some
obscure rules regarding mapping structs to memory, that I'm not aware of.

There's nothing particularly obscure about the rules.

"short"s are aligned to 2-byte boundaries, "int"s and up are
aligned to 4-byte boundaries.

As Beej says, you can eliminate the padding using __attribute__((packed)).
However, using -fpack-struct won't generally work, as it will cause your
code to assume that all structures are packed, even those which aren't.

Caveat: unaligned reads (e.g. reading an "int" from an address which isn't
a multiple of 4) don't work on some architectures. They either cause an
exception (typically manifested as SIGBUS) or, in the worst case (e.g.
some ARM chips), just silently "rotated" data.
 
K

Keith Thompson

Nobody said:
There's nothing particularly obscure about the rules.

"short"s are aligned to 2-byte boundaries, "int"s and up are
aligned to 4-byte boundaries.

As Beej says, you can eliminate the padding using __attribute__((packed)).
However, using -fpack-struct won't generally work, as it will cause your
code to assume that all structures are packed, even those which aren't.

Caveat: unaligned reads (e.g. reading an "int" from an address which isn't
a multiple of 4) don't work on some architectures. They either cause an
exception (typically manifested as SIGBUS) or, in the worst case (e.g.
some ARM chips), just silently "rotated" data.

Of course this is all implementation-specific. The C standard
recognizes the existence of alignment constraints, but it doesn't say
what they are.
 
B

Ben Pfaff

Nobody said:
There's nothing particularly obscure about the rules.

"short"s are aligned to 2-byte boundaries, "int"s and up are
aligned to 4-byte boundaries.

The rules are more obscure than that.

64-bit integers are aligned on 64-bit boundaries, except on
architectures where they're only aligned on 32-bit boundaries.

64-bit floating point values aren't necessarily aligned on the
same boundary as 64-bit integers.

The end of a struct is padded out to the largest alignment of any
member of the struct. Except that on some architectures, in some
configurations, structs are always padded up to a multiple of 32
bits, whether the struct contains 32- or 64-bit quantities at
all.

And those are just the rules that I know about, which apply to
relatively common architecture. There are undoubtedly other
architectures with other rules.

The only definitive rule about alignment is that it's difficult
to generalize about alignment.
 
B

barcaroller

Jack Klein said:
Yes, I'm sure it was too expensive when the platform was a 16 MHz 386.
And it was still too expensive when the platform was a 90 MHz Pentium.
How fast does the processor have to get to make bad, non-portable code
the only option to "too expensive".

I don't think you know what you're talking about. With a gigabit interface
on a router, if I were to do a memcpy() for every incoming packet, over 80%
of them would get dropped. And that's on a system with 16 3.2GHz Xeon
processors and 64GB of RAM.

In any case, if you are interested in non-portable gcc options
commonly used in Linux, the proper place to ask is a Linux group.

I never said I was interested in non-portable gcc options.

As far as C is concerned, the only problem is in your expectations,
which the language standard defines no help for.

And what would those expectations be?
 
M

Mark

Ben Pfaff said:
linux/ip.h doesn't, because everything gets packed the same way
on the architectures that Linux support.

For this case, where does the compiler take information on how to pack this
structure?
 
N

Nick Keighley

I agree but a lot of the Linux source code is based on exactly such
constructs (with bit-fields too).  

and hence Linux isn't portable. To the best of my knowledge Linux is
always compiled with GCC which has extensions and oddities
specifically
for the benefit of Linux.
For example, it would be too expensive to
memcpy() the fields of a packet buffer that is being read via libpcap in
real-time.  Linux just maps the structs.  I suspect that there are some
obscure rules regarding mapping structs to memory, that I'm not aware of.

not portable ones there aren't. Your example

uint8 value1 16
uint16 value2 0301
uint16 value3 008c

cannot exist on some architectures because a uint16 *must* start at an
even
address. Eg. some 68k architecures, a bit dated I know but I believe
some RISC machines are equally picky.
 
N

Nobody

For this case, where does the compiler take information on how to pack this
structure?

None of those structures contain unaligned fields, so there is no reason
for padding on any architecture.

You only need to tell the compiler to pack the structure if it would
otherwise add padding to maintain alignment.
 
N

Nobody

Yes, I'm sure it was too expensive when the platform was a 16 MHz 386.
And it was still too expensive when the platform was a 90 MHz Pentium.
How fast does the processor have to get to make bad, non-portable code
the only option to "too expensive".

In any case, if you are interested in non-portable gcc options
commonly used in Linux, the proper place to ask is a Linux group.

As far as C is concerned, the only problem is in your expectations,
which the language standard defines no help for.

This post is a good example of certain regulars' obsession with the
standard in preference to reality.

Lots of real-world code gets written which isn't 100%-portable, and it
usually works.

The language *standard* may not specify this behaviour. Fortunately, most
of the C compilers in existence were written by people who are more
interested in practical use than in language-lawyering, so they
make the effort to let you do all this "invalid" stuff.
 
N

Nobody

not portable ones there aren't. Your example

uint8 value1 16
uint16 value2 0301
uint16 value3 008c

cannot exist on some architectures because a uint16 *must* start at an
even address. Eg. some 68k architecures, a bit dated I know but I believe
some RISC machines are equally picky.

This is an issue of the platform rather than the architecture.

Most ARM C compilers will allow you to pack the above into a 5-byte
structure. The compiler will generate multiple reads and use shift/and/or
to re-construct the value. Sometimes this may require either adding
platform-specific qualifiers to pointers, or telling the compiler to
assume unaligned access (and eating the substantial performance hit which
results).
 
J

jacob navia

Nobody said:
This post is a good example of certain regulars' obsession with the
standard in preference to reality.

The C standard is not accepted by most of those.
C99 is declared "non portable" and the preferred solution
is to go back to 1989.
Lots of real-world code gets written which isn't 100%-portable, and it
usually works.

Do not speak about the real world to those people. Please. They
just can't accept anything coming from there.

:)
 
J

James Kuyper

Nobody said:
None of those structures contain unaligned fields, so there is no reason
for padding on any architecture.

Most of the members of both structs described by Ben Pfaff have data
types specified by names starting with two underscores; presumably
typedefs, though they could also be non-standard keywords. Whether any
of those names identifies a type at all depends upon the implementation;
the behavior of code that contains such identifiers is undefined by the
C standard (7.1.3p1). On implementations where they do represent valid
types, the alignment requirements of those types are also
implementation-dependent.

Therefore, whether or not struct iphdr has any unaligned fields is
implementation-specific. Presumably, it wouldn't actually be used on any
implementation where that would be a problem, but this newsgroup isn't
the best place to discuss issues that are specific to the
implementations where it is not problematic.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,241
Members
46,831
Latest member
RusselWill

Latest Threads

Top