struct mapping problem (gcc on Linux)

James Kuyper · Jul 29, 2009

Nobody wrote:
....

Lots of real-world code gets written which isn't 100%-portable, and it
usually works.

Yes, and such code is more appropriately discussed in forums specific to
the types of systems where it does work. You'll get better, more
informed discussions there.

Stephen Sprunk · Jul 29, 2009

barcaroller said:
I have a block of memory that looks as follows:

uint8 value1 16
uint16 value2 0301
uint16 value3 008c

However, when I use a C struct to map this block of memory, like this:

typedef struct
{
uint8 value1;
uint16 value2;
uint16 value3;
} mytype;

mytype p* = (mytpye*) block_of_memory

I get the following results:

sizeof(mytype) is being reported as 6, and not 5

value1 shows as 0x16 (correctly)
value2 shows as 0x01 (not 0x0301)
value3 shows as 0x018c (not 0x008c;
the 01 is the out-of bounds sixth byte)

Obviously, the compiler has its own idea of how to map this struct but I've
seen this kind of mapping all over the Linux header files (e.g. TCP and IP
header files). I've used similar mappings in my own code in the past, and I
*think* it always worked. What am I missing? And how should I go about
fixing this?

http://www.c-faq.com/struct/padding.html

S

Stephen Sprunk · Jul 29, 2009

Nick said:
To the best of my knowledge Linux is always compiled with GCC which has
extensions and oddities specifically for the benefit of Linux.

The Linux kernel can be (and sometimes is) compiled with ICC. In fact,
Intel recognized the importance of those "extensions and oddities" by
not shipping ICC for Linux until it could function as a drop-in
replacement for GCC, including compiling the kernel. (Similarly, ICC
for Windows is a drop-in replacement for MSVC's compiler, with all its
respective "extensions and oddities".)

Whether those "extensions and oddities" are specifically for the benefit
of Linux. GCC clearly existed first and the first few versions of Linux
were up and running, using GCC's existing "extensions and oddities",
long before anyone on the GCC team had heard of Linux. Since then,
though, the development and growth of the two has been symbiotic.

S

Stephen Sprunk · Jul 29, 2009

Nobody said:
There's nothing particularly obscure about the rules.

"short"s are aligned to 2-byte boundaries, "int"s and up are
aligned to 4-byte boundaries.

It is simpler to say that each type generally must be aligned to a
multiple of its size. IIRC, this is true for all integer types on all
platforms that Linux supports.

However, as Ben Pfaff notes, this isn't always true. There are lots of
different systems out there, and each has its own (often bizarre) rules.

S

Keith Thompson · Jul 29, 2009

jacob navia said:
Nobody wrote: [...]

This post is a good example of certain regulars' obsession with the
standard in preference to reality.

Click to expand...

The C standard is not accepted by most of those.
C99 is declared "non portable" and the preferred solution
is to go back to 1989.

C99, unfortunately, *is* non-portable. I wish that were not the
case, but it is, and since I'm not a compiler implementer I'm not
in a position to change that. Going back to the 1989 standard is
not preferred, but it's often necessary.

Do you dispute the fact that it's more difficult to find fully
conforming C99 compilers than fully conforming C90 compilers?

If I write code that depends on C99-specific features, what do I
do later when the code needs to be ported to a platform that has
no compiler that supports all those features?

By producing a compiler that, as far as I can tell, nearly conforms
to the C99 standard, you've done more than most of us have done,
or can do, to improve the situation.

(I say "nearly" because you've never clearly stated that lcc-win
*fully* conforms to the C99 standard, and you've referred to
partially conforming compilers as "C99 compilers", so I'm unwilling
to make any assumptions.)

Keith Thompson · Jul 29, 2009

Stephen Sprunk said:
It is simpler to say that each type generally must be aligned to a
multiple of its size. IIRC, this is true for all integer types on all
platforms that Linux supports.

Not quite. On one system I use (Linux on x86), long long is 8 bytes,
double is 8 bytes, and long double is 12 bytes, but each only requires
4-byte alignment.

And the alignment of a type cannot be greater than its size; arrays
can't have gaps.

However, as Ben Pfaff notes, this isn't always true. There are lots of
different systems out there, and each has its own (often bizarre) rules.

It is universally true that each type's size is an integer multiple of
its alignment.

Stephen Sprunk · Jul 29, 2009

Keith said:
Not quite. On one system I use (Linux on x86), long long is 8 bytes,
double is 8 bytes, and long double is 12 bytes, but each only requires
4-byte alignment.

I wasn't aware that long long int had only 4-byte alignment on
Linux/x86; thanks for the correction.

double and long double aren't integer types.

(Perhaps I should modify the rule to say a multiple of the type's size
or of the CPU's GPR width, whichever is less?)

S

Keith Thompson · Jul 29, 2009

Stephen Sprunk said:
I wasn't aware that long long int had only 4-byte alignment on
Linux/x86; thanks for the correction.

double and long double aren't integer types.

Good point -- but I think similar considerations apply.

(Perhaps I should modify the rule to say a multiple of the type's size
or of the CPU's GPR width, whichever is less?)

It seems silly to say the alignment is a *multiple* of the size,
since the multiplier can't be greater than 1.

As I said, it's always the case that the size is a multiple of
the alignment.

It may *almost* always be the case that the alignment of an integer
type is either the size of the type or the size of a general-purpose
register, whichever is smaller. If so, that could be a good rule
of thumb.

On the other hand, of course, the standard imposes no such
requirements. One *could* have an architecture with, say, 32-bit or
64-bit registers where there's no penalty for "misaligned" accesses,
so everything has 1-byte alignment.

Most of the time you don't need a rule of thumb; the alignment of
a type is the alignment of the type, and you write code that works
whatever that alignment may be. But yes, sometimes it's necessary
to dig deeper.

Ben Pfaff · Jul 29, 2009

Keith Thompson said:
One *could* have an architecture with, say, 32-bit or 64-bit
registers where there's no penalty for "misaligned" accesses,
so everything has 1-byte alignment.

The Intel 8088 was one such architecture (though with 16-bit
registers), since all memory accesses went through an 8-bit bus.

Nobody · Jul 29, 2009

Yes, and such code is more appropriately discussed in forums specific to
the types of systems where it does work. You'll get better, more
informed discussions there.

There is no such forum for the general case.

At least, I can't find comp.os.linux-and-windows-and-mac-but-i-dont-give-a
shit-about-crays.

99.99% of the "C" code out there (stuff written by people whose job
description is "C" programmer and is compiled with a "C" compiler) uses
implementation-dependent behaviour.

I don't accept that this group is "reserved" for discussing the other 0.01%.

The original question was specifically about Linux, but issues with
alignment and packing apply to all platforms (e.g. most of Microsoft's
file formats boil down to fwrite()ing C structs). Pointing out that
something is "not guaranteed by the standard" may be useful, but that
doesn't automatically end the discussion.

jameskuyper · Jul 29, 2009

Nobody said:
There is no such forum for the general case.

The general case, by definition, isn't implementation-specific.

At least, I can't find comp.os.linux-and-windows-and-mac-but-i-dont-give-a
shit-about-crays.

I think that you'll find that the overwhelming majority of the
messages posted to this newsgroup that discuss implementation-
dependent issues refer to code with is actually targeted to only one
of those types of systems. "All the world uses Windows" is the most
popular assumption, with "All the world uses Linux" in second place,
by my unofficial count. The actual Window/linux/mac feature that their
code relies upon might be shared by all three types of systems, but
they aren't actually interested in other two.

99.99% of the "C" code out there (stuff written by people whose job
description is "C" programmer and is compiled with a "C" compiler) uses
implementation-dependent behaviour.

And the parts that are implementation-dependent should be discussed in
forums specialized for the particular platforms where the

I don't accept that this group is "reserved" for discussing the other 0.01%.

It isn't. It's for discussing the portable portion of all of those non-
portable programs. I won't invent any statistics about how big that
portion is, but it's a large fraction of the code in most of the
programs that I'm responsible for. I imagine that it constitutes a
smaller fraction of typical GUI or embedded applications, but for
precisely that reason, discussions of the code that constitutes the
bulk of such applications should be posted to a forum specific to the
particular GUI environment or embedded platform that it's intended to
target.

The original question was specifically about Linux, but issues with
alignment and packing apply to all platforms (e.g. most of Microsoft's
file formats boil down to fwrite()ing C structs). Pointing out that
something is "not guaranteed by the standard" may be useful, but that
doesn't automatically end the discussion.

While there are alignment issues and packing options in all of those
environments, they are different issues with different resolutions.
You couldn't really say much that's both accurate and useful about
them in comp.os.linux-and-windows-and-mac-but-i-dont-give-a-shit-about-
crays, because the scope of that group would be too large.

Phil Carmody · Jul 29, 2009

Nobody said:
None of those structures contain unaligned fields, so there is no reason
for padding on any architecture.

Wow, I see someone who's never programmed telecomms software for 24-bit
processors. You should open your eyes a little.

Phil

Phil Carmody · Jul 29, 2009

Ben Pfaff said:
The rules are more obscure than that.

64-bit integers are aligned on 64-bit boundaries, except on
architectures where they're only aligned on 32-bit boundaries.

64-bit floating point values aren't necessarily aligned on the
same boundary as 64-bit integers.

And 80-bit floating point values are aligned to 32-bit boundaries,
excpet when they're aligned to 64-bit boundaries.

The end of a struct is padded out to the largest alignment of any
member of the struct. Except that on some architectures, in some
configurations, structs are always padded up to a multiple of 32
bits, whether the struct contains 32- or 64-bit quantities at
all.

And those are just the rules that I know about, which apply to
relatively common architecture. There are undoubtedly other
architectures with other rules.

The only definitive rule about alignment is that it's difficult
to generalize about alignment.

Indeedy.

Phil

Ben Pfaff · Jul 29, 2009

Phil Carmody said:
Wow, I see someone who's never programmed telecomms software for 24-bit
processors. You should open your eyes a little.

What kind of alignment rules do 24-bit processors have?

Keith Thompson · Jul 29, 2009

Phil Carmody said:
And 80-bit floating point values are aligned to 32-bit boundaries,
excpet when they're aligned to 64-bit boundaries.

[...]

That can be the case only of the 80-bit floating point values are
padded to at least 96 or 128 bits, where the padding is actually
part of the object. C arrays cannot have gaps between elements.

Beej Jorgensen · Jul 30, 2009

Mark McIntyre said:
In other words, its nonportable and would have to be reworked on a
different architecture. [...] Rule #1 of optimisation applies.

Yes, but... it's the _kernel_--it gets a lot of leeway for being not
strictly portable and having extreme optimizations.

-Beej

Phil Carmody · Jul 30, 2009

Ben Pfaff said:
What kind of alignment rules do 24-bit processors have?

Various, as one should have expected.
There's the "everything on 24-bit boundaries" variety.
There's the "bigger things on 48-bit boundaries" twist.
There's sometimes the "chars can be packed at 8-bit boundaries" option
too, if you're (or your compiler, should I say) is prepared to jump
through enough hoops. (Pointers becoming not numerically equal, bit-
representation-wise, to the addresses of the object are then a must,
which means that all non-portable assumptions about pointer arithmetic
break instantly.)

Phil

Phil Carmody · Jul 30, 2009

Keith Thompson said:
Phil Carmody said:

And 80-bit floating point values are aligned to 32-bit boundaries,
excpet when they're aligned to 64-bit boundaries.

Click to expand...

[...]

That can be the case only of the 80-bit floating point values are
padded to at least 96 or 128 bits, where the padding is actually
part of the object. C arrays cannot have gaps between elements.

As someone who uses 80-bit floats a lot, this is one thing that
narks me. It's a clear weakness in the C standard if it has to
force the implementation to lie about the size of an object when
I use sizeof, by telling me the size of the tile in which it fits
instead.

I'd like struct { long double f; char c[2]; } to fit in 12 bytes on
an x86, because in reality it does.

Phil

Phil Carmody · Jul 30, 2009

Beej Jorgensen said:
Mark McIntyre said:

In other words, its nonportable and would have to be reworked on a
different architecture. [...] Rule #1 of optimisation applies.

Click to expand...

Yes, but... it's the _kernel_--it gets a lot of leeway for being not
strictly portable and having extreme optimizations.

In that case it's broken. A kernel should no less isolate non-portable
parts into one little corner of the code (perhaps as macros, so that
optimisations can still be done more easily) than any other application.

Fortunately, it's not all that broken, and such attitudes do prevail.
I've not needed to worry about the endian-ness of the architecture I'm
currently using, despite getting larger-than-byte-sized data from an
external data source, purely because such things are already taken care
of.

Phil

Beej Jorgensen · Jul 30, 2009

Phil Carmody said:
A kernel should no less isolate non-portable parts into one little
corner of the code (perhaps as macros, so that optimisations can still
be done more easily) than any other application.

I agree, unless it means architectures have to suffer unreasonable
performance hits to make it so.

But I'm going to have to generally defer, because I'm not a kernel guy.
However, I'm going to guess by the sheer number of ports that Linux has
its non-portable parts as isolated as they need to be.

-Beej

2/4 bytes boundary problem	20	May 15, 2006
A Map with both key and value as Struct	0	Feb 3, 2009
bit fields in a structure	2	May 17, 2004
confusing string parsing problem	9	Sep 2, 2009
Initialization Of Structures.	4	Sep 22, 2006
problem with encryption function sending hex values	2	Nov 27, 2005
A weird problem on structure and union alignment	5	Aug 25, 2007
STL based LRU cache, any suggestions for improvements?	0	May 1, 2007

struct mapping problem (gcc on Linux)

James Kuyper

Stephen Sprunk

Stephen Sprunk

Stephen Sprunk

Keith Thompson

Keith Thompson

Stephen Sprunk

Keith Thompson

Ben Pfaff

Nobody

jameskuyper

Phil Carmody

Phil Carmody

Ben Pfaff

Keith Thompson

Beej Jorgensen

Phil Carmody

Phil Carmody

Phil Carmody

Beej Jorgensen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads