Data alignment questin, structures

J

James Kuyper

Now, what happens when this struct is embedded in a binary file stored
in a character array "buffer[]" such that its position "i" is not
guaranteed to be aligned on the same boundary that

Mystract *instance2 = malloc(sizeof(Mystruct));

would have used, but will always be on a multiple of 2.

1. Will this always work?

function1((Mystruct *) &(buffer));
(Eric Sosman wrote:)
Not "always," no. 6.3.2.3p7: "A pointer to an object type
may be converted to a pointer to a different object type. If the
resulting pointer is not correctly aligned for the referenced
type, the behavior is undefined. [...]" There's no guarantee
that the alignment of `buffer' suffices for `Mystruct'.


On further consideration, how does one know what "correctly aligned"
means for a given struct? Consider these two examples:

typedef struct {
uint16_t one;
uint16_t two;
uint16_t three;
uint16_t four;
} Mystruct4;



I've noticed that you routinely fail to provide citations for who wrote
the things that you quoted. That's rather rude to the persons who said
it,, and also inconvenient for the rest of us trying to figure out who
said what. It can also get you into legal trouble in some contexts,
thought usenet posts are so easily forged that it would be hard to
imagine that happening in this context.

I deny permission to any and all authors to quote any text that I write
without appropriate citation. Don't bother responding if you're not
willing to provide it.
Taking into account the various answers in this thread to the preceding
question it would seem that there is still no safe (safe meaning here
"code that will work on any platform") method to use a C struct
to extract binary data directly from memory, since the compiler can
throw in padding and alignment requirements whenever it feels like it.

Not quite - for instance, padding is not allowed at the beginning, and
the alignment requirement for an aggregate object (a struct or an array)
must be at least as strict as it is for any of the members of the
aggregate. Still, the basic point remains - the standard does not
provide enough guarantees to allow the use such a struct to read date
directly from a file.
Even if the data is reduced to byte representation, something like this
(I know it isn't the same as Mystruct4 above):

typedef struct {
uint8_t one[2]; /* actually uint16_t */
uint8_t two[4]; /* actually uint32_t */
int8_t three[2]; /* etc. */
int8_t four;
} Mystruct4b;

it seems not to be safe to pass a "Mystruct4b *" pointer to a function
which references this data at an arbitrary location in memory. Instead
the only safe method is to pass "char *" pointers and take the data
apart with memcpy() at a very low level, moving it from memory to the
structure, or vice versa.

Correct. Because of these things that are not guaranteed, C structs
should be considered as suitable only for use for in-memory objects, not
for reading or writing from files.
What an odd situation. I would have thought with all of the data that
does show up in C programming as "a series of bytes in a buffer arranged
in some particular manner", that is, pretty much any data which is
passed from machine to machine, C would have developed a method for
simplifying this sort of code. Perhaps something along these lines

I agree that some such feature would be useful. But there are problems
with the various possible alternatives.
/* declare memory organization, no padding, no alignment requirement */
typedef memstruct {
uint16_t one;
uint32_t two;
int16_t three;
uint8_t four;
} Mystruct4c;

Specifically so that one could pass a "Mystruct4c *" pointer to a
function, like so:

myfunction3a(Mystruct4c *ptr){
ptr->two = 5;
printf("value of three:%d\n",ptr->three);
}


The fundamental problem with such a feature is that &ptr->two is not
necessarily correctly aligned for a uint32_t object. This is not a
serious problem on some architectures, where alignment is a mere
efficiency issue. However, there are other platforms where the hardware
imposes very strict alignment requirements.
In code where Mystruct4c is in scope, and where the derivation of the
pointer from an object of that type is perfectly clear, the compiler
could write special code equivalent to using memcpy() to copy from
ptr->two to correctly aligned memory, and then passing the value stored
in that temporary to printf(); for writing to such an object, a pointer
to correctly aligned memory could be used, and them the equivalent of
memcpy() could be used to copy from the temporary object back to the
incorrectly aligned memory. However, consider code like the following:

uint32_t default = 42;
uint32_t *p = default < ptr->two ? &default : &ptr->two;
myfunc(p);

How can myfunc know how to use 'p', when it might be either a pointer to
memory which is correctly aligned for a uint32_t, or a misaligned
pointer? Also, using pointers to temporary correctly-aligned pieces of
memory would mess up comparisons of pointers for equality or relative order.
 
M

mathog

James said:
I've noticed that you routinely fail to provide citations for who wrote
the things that you quoted.
Sorry. When the number of replies is more than 1 or 2 I can't ever
remember who is >>>> and who is >>>>>> in any case.

That's rather rude to the persons who said
it,, and also inconvenient for the rest of us trying to figure out who
said what. It can also get you into legal trouble in some contexts,
thought usenet posts are so easily forged that it would be hard to
imagine that happening in this context.

I deny permission to any and all authors to quote any text that I write
without appropriate citation. Don't bother responding if you're not
willing to provide it.
Taking into account the various answers in this thread to the preceding
question it would seem that there is still no safe (safe meaning here
"code that will work on any platform") method to use a C struct
to extract binary data directly from memory, since the compiler can
throw in padding and alignment requirements whenever it feels like it.

Not quite - for instance, padding is not allowed at the beginning, and
the alignment requirement for an aggregate object (a struct or an array)
must be at least as strict as it is for any of the members of the
aggregate. Still, the basic point remains - the standard does not
provide enough guarantees to allow the use such a struct to read date
directly from a file.
Even if the data is reduced to byte representation, something like this
(I know it isn't the same as Mystruct4 above):

typedef struct {
uint8_t one[2]; /* actually uint16_t */
uint8_t two[4]; /* actually uint32_t */
int8_t three[2]; /* etc. */
int8_t four;
} Mystruct4b;

it seems not to be safe to pass a "Mystruct4b *" pointer to a function
which references this data at an arbitrary location in memory. Instead
the only safe method is to pass "char *" pointers and take the data
apart with memcpy() at a very low level, moving it from memory to the
structure, or vice versa.

Correct. Because of these things that are not guaranteed, C structs
should be considered as suitable only for use for in-memory objects, not
for reading or writing from files.
What an odd situation. I would have thought with all of the data that
does show up in C programming as "a series of bytes in a buffer arranged
in some particular manner", that is, pretty much any data which is
passed from machine to machine, C would have developed a method for
simplifying this sort of code. Perhaps something along these lines

I agree that some such feature would be useful. But there are problems
with the various possible alternatives.
/* declare memory organization, no padding, no alignment requirement */
typedef memstruct {
uint16_t one;
uint32_t two;
int16_t three;
uint8_t four;
} Mystruct4c;

Specifically so that one could pass a "Mystruct4c *" pointer to a
function, like so:

myfunction3a(Mystruct4c *ptr){
ptr->two = 5;
printf("value of three:%d\n",ptr->three);
}


The fundamental problem with such a feature is that &ptr->two is not
necessarily correctly aligned for a uint32_t object.

Exactly. memstruct != struct. A memstruct is a memory structure
designed for accessing nonaligned data, the field organization is
verbatim and cannot be distorted by the compiler to add padding or
require alignments. In a memstruct "ptr->two" means some equivalent of:

uint32_t tmp4;
memcpy(&tmp4,ptr + offsetof(Mystruct4c,two),4);

especially including the cases where tmp4 is in a register, "memcpy" is
some assembler operation(s) to copy from a random point in memory into
that register, so that

&(ptr->two)

would not be allowed. Just like this:

register uint32_t tmp4;
printf("Pointer to:%p\n",&tmp4);

is not valid. Presumably casting a memstruct to a struct and vice versa
would also be forbidden. That was the point of:
Mystruct4c native;
Mystruct4b *foreign
/* field names/sizes must match for the following statement */
alternate_representations {Mystruct4b, Mystruct4c}

which tells the compiler that it needs to know how to copy data in
either direction between a specific memstruct and a specific struct.
(And that would be illegal if the fields didn't match exactly.)

I am not worried about endian issues or other floating point
representations. Both of these can be dealt with easily enough once the
data has been mapped from nonaligned to aligned data structures. As for
bytes that are not 8 bits, does any modern hardware exist where that is
not true? Must we keep the requirement to run on archaic hardware
forever going forward with newer versions of the language?

Regards,

David Mathog
 
8

88888 Dihedral

Keith Thompsonæ–¼ 2013å¹´1月20日星期日UTC+8下åˆ11時35分29秒寫é“:
[...]
What an odd situation. I would have thought with all of the data that
does show up in C programming as "a series of bytes in a buffer arranged
in some particular manner", that is, pretty much any data which is
passed from machine to machine, C would have developed a method for
simplifying this sort of code. Perhaps something along these lines

/* declare memory organization, no padding, no alignment requirement */
typedef memstruct {
uint16_t one;
uint32_t two;
int16_t three;
uint8_t four;
} Mystruct4c;

Specifically so that one could pass a "Mystruct4c *" pointer to a
function, like so:

myfunction3a(Mystruct4c *ptr){
ptr->two = 5;
printf("value of three:%d\n",ptr->three);


and the compiler would do the "right thing", using memcpy or whatever,
to hide all of the cruft that is platform specific, up to and including
loading the magic N bytes of memory (as in the Alpha issue Glen
mentioned), shifting, and masking and so forth, to handle data types
smaller than is native for the CPU, without the programmer having to
ever care about the details.

[...]



Consider this:



Mystruct4c obj;

void func(uint32_t *arg);

func(&obj.two);



Given your definition of what a "memstruct" is, the code that implements

func() would have to allow for the possibility that its argument points

to an unaligned int32_t object.



(Note that gcc's "__attribute__((packed))" doesn't solve this; see my

discussion here: http://stackoverflow.com/q/8568432/827263 and here:

http://stackoverflow.com/a/8568441/827263.)



The alternative, I suppose, would be to forbid taking the address of a

member of a "memstruct", treating its members much like bit fields.



--

Keith Thompson (The_Other_Keith) (e-mail address removed) <http://www.ghoti.net/~kst>

Working, but not speaking, for JetHead Development, Inc.

"We must do something. This is something. Therefore, we must do this."

-- Antony Jay and Jonathan Lynn, "Yes Minister"

The endian problem in C does matter in programming
with a file format which specifies the endian type
independently from the structure in the memory which the endian is specified by the machine or the emulator.

But that is not hard to solve if one got paid well.
 
S

Shao Miller

Exactly. memstruct != struct. A memstruct is a memory structure
designed for accessing nonaligned data, the field organization is
verbatim and cannot be distorted by the compiler to add padding or
require alignments. In a memstruct "ptr->two" means some equivalent of:

uint32_t tmp4;
memcpy(&tmp4,ptr + offsetof(Mystruct4c,two),4);

especially including the cases where tmp4 is in a register, "memcpy" is
some assembler operation(s) to copy from a random point in memory into
that register, so that

&(ptr->two)

would not be allowed. Just like this:

register uint32_t tmp4;
printf("Pointer to:%p\n",&tmp4);

is not valid. Presumably casting a memstruct to a struct and vice versa
would also be forbidden.

You can't cast non-scalar types except for 'void', so it's always a
constraint violation to try '(somestructtype) otherstruct'.

If you were talking about casting pointers to structs and your new
memstructs, there is not a lot of discussion in the Standard that
forbids casting pointers, so I'm not sure:

1. Why you'd make that forbidden
2. How it would be worded so as to be a special case (constraint
violation or undefined behaviour?)
That was the point of:


which tells the compiler that it needs to know how to copy data in
either direction between a specific memstruct and a specific struct.
(And that would be illegal if the fields didn't match exactly.)

Ok, that would need some thought towards how that would be worded. You
couldn't use "compatible type," but you could reproduce the Standard's
explanation of when two structure types in two different translation
units are considered compatible types, but use it in your definition of
whatever you'd call the relationship between a struct and a memstruct.
I am not worried about endian issues or other floating point
representations. Both of these can be dealt with easily enough once the
data has been mapped from nonaligned to aligned data structures.

Actually, I'm not sure what makes you say that. If a member has a trap
representation, the act of reading the stored value as the type of the
member yields undefined behaviour. You might be used to
[non-C-Standard] 'htonl', but you can't really do that sort of thing if
there's a trap representation, in general.
As for
bytes that are not 8 bits, does any modern hardware exist where that is
not true? Must we keep the requirement to run on archaic hardware
forever going forward with newer versions of the language?

You can use limits.h's 'CHAR_BIT' to determine how many bits in a byte.

But again, why do you want to use this 'memstruct', which looks similar
to a 'struct'? Is it just because you like using the '.' and '->'
operators?
 
B

BartC

mathog said:
James Kuyper wrote:
As for bytes that are not 8 bits, does any modern hardware exist where
that is not true? Must we keep the requirement to run on archaic hardware
forever going forward with newer versions of the language?

There are still processors around that are not byte-addressable, so a C
implementation might have a char width that is not 8 bits (but will likely
be 16, 32 or 64). Sometimes an 8-bit char type might be emulated in such a
case.

But these tend to be specialist processors and you might just ignore them if
you know exactly what hardware you're going to run on.
 
J

James Kuyper

On 01/20/2013 01:26 PM, mathog wrote:
....
... As for
bytes that are not 8 bits, does any modern hardware exist where that is
not true?

I gather that there is VERY modern hardware in the embedded world where
CHAR_BIT == 16; in fact, it's a fairly common feature in that world.
It's not my field of expertise, so I can't give you specific examples,
but other people have brought them up on previous occasions when this
question was raised.
There have been people posting to this news group who expressed the
belief that all modern C programming is for embedded systems; I remember
having a very hard time convincing one of them that he was wrong.
 
8

88888 Dihedral

James Kuyperæ–¼ 2013å¹´1月21日星期一UTC+8上åˆ4時53分10秒寫é“:
On 01/20/2013 01:26 PM, mathog wrote:

...






I gather that there is VERY modern hardware in the embedded world where

CHAR_BIT == 16; in fact, it's a fairly common feature in that world.

It's not my field of expertise, so I can't give you specific examples,

but other people have brought them up on previous occasions when this

question was raised.

There have been people posting to this news group who expressed the

belief that all modern C programming is for embedded systems; I remember

having a very hard time convincing one of them that he was wrong.

#define LASTBYTE(x) x&0xff
#define SECONDBYTE(x) (x&0xff00)>>8

LASTWORD and SECONDWORD are similar.
 
G

glen herrmannsfeldt

(snip, someone wrote)
Correct. Because of these things that are not guaranteed, C structs
should be considered as suitable only for use for in-memory objects, not
for reading or writing from files.

Well, you can write them to a file and read them back in the same
program. Also, another program compiled by the same compiler and
run on a similar system should also work.

Padding and alignment can change between versions of a compiler
on the same system, though usually don't.

-- glen
 
G

glen herrmannsfeldt

(snip)
which tells the compiler that it needs to know how to copy data in
either direction between a specific memstruct and a specific struct.
(And that would be illegal if the fields didn't match exactly.)
I am not worried about endian issues or other floating point
representations. Both of these can be dealt with easily enough once the
data has been mapped from nonaligned to aligned data structures. As for
bytes that are not 8 bits, does any modern hardware exist where that is
not true? Must we keep the requirement to run on archaic hardware
forever going forward with newer versions of the language?

Well, in C a byte is CHAR_BIT bits long, and there are systems where
that isn't 8, but still usually a multiple of 8.

The 36 and 60 bit machines are pretty much gone.

-- glen
 
B

BartC

Correct. Because of these things that are not guaranteed, C structs
should be considered as suitable only for use for in-memory objects, not
for reading or writing from files.

I've spent quite a few years having to interface with innumerable C
structures, from another language.

That usually involved the other language being able to exactly match the
types, sizes, alignments and padding used by a specific C implementation,
when defining its own structs.

But C itself doesn't seem to have a need to do the same! Yet there are
plenty of situations where this would be useful (to specify exactly, at
compile time, how a struct is laid out).
 
L

Les Cargill

BartC said:
I've spent quite a few years having to interface with innumerable C
structures, from another language.

That usually involved the other language being able to exactly match the
types, sizes, alignments and padding used by a specific C
implementation, when defining its own structs.

But C itself doesn't seem to have a need to do the same! Yet there are
plenty of situations where this would be useful (to specify exactly, at
compile time, how a struct is laid out).


Between #pragma pack() ( where available) and use of filler, you can
get quite close. 'Course there are still endian issues, etc.
 
I

Ian Collins

BartC said:
I've spent quite a few years having to interface with innumerable C
structures, from another language.

That usually involved the other language being able to exactly match the
types, sizes, alignments and padding used by a specific C implementation,
when defining its own structs.

But C itself doesn't seem to have a need to do the same! Yet there are
plenty of situations where this would be useful (to specify exactly, at
compile time, how a struct is laid out).

More often than not, where that requirement exists, the means for
meeting are provided. I'm thinking primarily of embedded compilers
here. I've written many a driver where structs overlay sets of registers.
 
M

mathog

James said:
On 01/20/2013 01:26 PM, mathog wrote:
...

I gather that there is VERY modern hardware in the embedded world where
CHAR_BIT == 16; in fact, it's a fairly common feature in that world.

On this hardware is memory access also 16 bits at a time? I wonder what
it does if it reads a network packet with an odd number of 8 bit bytes,
or worse yet, needs to send one?

A 16 bit char would make life fairly miserable because there would then
be no way to avoid the endian issue. The machine with that smallest
data type might not consider its 16 bit char as either type of "endian",
but incoming network data (for instance) must put its 8 bit bytes in one
order or the other within it, so the description of a data structure in
memory would have to be done conditionally for the 16 bit char system.

Admittedly pragma pack gets most of what I need from a memstruct in a
struct. Has that pragma made it into some version of the C standard, or
is it still a compiler specific extension (albeit a common one)?

Thanks,

David Mathog
 
S

Shao Miller

Admittedly pragma pack gets most of what I need from a memstruct in a
struct. Has that pragma made it into some version of the C standard, or
is it still a compiler specific extension (albeit a common one)?

It's not in any Standard that I'm aware of, and it makes demands of
implementers that could potentially make things difficult for their
implementations. Also, given that there are libraries to help with this
sort of thing, I don't have any idea why it'd be a priority to standardize.

If network hardware really yields a 16-bit value, I'd expect that if
8-bit values are desired for working with, there will be utility API
available on the platform for working with these values, 8 bits at a
time. As an exercise, perhaps it'd be interesting to write a function
which reads a file 16 bits at a time (maybe by reading two 8-bit bytes),
but yields 8 bits at a time to the caller.

The high bits of a 16-bit byte can be empty, leaving the same range of
values as an 8-bit byte. The shift operators can be used to split a
16-bit byte into 2 8-bit bytes.
 
K

Keith Thompson

mathog said:
On this hardware is memory access also 16 bits at a time? I wonder what
it does if it reads a network packet with an odd number of 8 bit bytes,
or worse yet, needs to send one?

You're assuming it's able to read and write network packets using a
protocol that uses 8-bit bytes.

[...]
Admittedly pragma pack gets most of what I need from a memstruct in a
struct. Has that pragma made it into some version of the C standard, or
is it still a compiler specific extension (albeit a common one)?

No, neither pragma pack or anything that works like it is in any version
of the C standard. There are some tricky corner cases that would have
to be resolved first.
 
B

Bart van Ingen Schenau

See this answer on stackoverflow for some examples:
http://stackoverflow.com/a/6971919/430719
On this hardware is memory access also 16 bits at a time?

Yes, in units of 16-bits or larger.
I wonder what
it does if it reads a network packet with an odd number of 8 bit bytes,
or worse yet, needs to send one?

Those devices typically don't have to deal with that situation.
Many processors that have `CHAR_BIT == 16` (or event `CHAR_BIT == 32`)
are DSPs (Digital Signal Processors) that don't communicate with octet-
oriented peripherals (such as networks).
And when they do, it turns out that all the relevant protocols were
designed to be friendly for such processors (all multi-octet fields
aligned to a multiple of their size, protocol headers a multiple of 4
octets, etc.)

What I have also seen with such a processor is that each octet received
from outside is put into a separate 16-bits char. It was up to the
programmer to combine the values from multiple chars into a single value
if the protocol actually contained a multi-octet value at that point.
No amount of memory mapping technique is going to help you there.
Admittedly pragma pack gets most of what I need from a memstruct in a
struct. Has that pragma made it into some version of the C standard, or
is it still a compiler specific extension (albeit a common one)?

The #pragma directive is specified in the C standard, but in the most
part it is defined to have implementation-defined effects.
Thanks,

David Mathog

Bart v Ingen Schenau
 
M

mathog

Dr said:
Forcing my ordering, but allowing for indeterminate padding leaves us
somewhere between two stools.

That's my view too. The standards seem to think that one set of
behavior for struct must fit all, and consequently it fits some common
applications not very well at all. The idea of the "memstruct" I
proposed was that rather than trying to make struct do everything,
create a related construct specifically for handling data in memory,
leaving the regular "struct" to be optimized in the other direction,
and the compiler knowing enough to to copy data back and forth between a
memstruct and a struct, without the programmer having to know about the
alignment, platform hardware, and the other nitty gritty details that
are needed now. Presumably such an extension could also handle issues
with platforms having 16 or 32 bit chars. (The programmer can with
enough work, so the compiler should be able to do it too.)

On further consideration I think I would do memstruct slightly
differently then I originally suggested. Since it would hardly ever be
used except in conjunction with its corresponding struct rather than
duplicating all of the fields like this:

typedef memstruct {
/*fields*/
} mymemstruct;

typedef struct {
/* same fields as preceding */
} mystruct

just define the regular struct and then define both the memstruct and
its relationship to the struct with a statement something like this:

typedef memstruct { mystruct; } mymemstruct;

(The syntax doesn't really matter, it just needs to tell the compiler to
associate memstruct "mymemstruct" with struct "mystruct".)

Regards,

David Mathog
 
T

Tim Rentsch

mathog said:
Dr said:
Forcing my ordering, but allowing for indeterminate padding
leaves us somewhere between two stools.

That's my view too. The standards seem to think that one set
of behavior for struct must fit all, and consequently it fits
some common applications not very well at all. The idea of the
"memstruct" I proposed [...snip]

For the most part discussions in comp.lang.c discuss C as it is.
If you want to propose a new language feature, you may find that
comp.std.c is a better venue for that.
 
G

glen herrmannsfeldt

(snip)
Structures are one of the areas where I think C has ended up in a rather
strange place. They don't (as discussed here) work for mapping memory
to abstract data types and they don't work quite as well as they could
do for structuring program data.
I'd love to see something where I could tell the compiler "I've got
zillions of these things in my program, please order them in the most
memory efficient way" or "this is master configuration data that will be
used all over the place but there's only one copy, so optimise for rapid
access".

Fortran theoretically can do this by default for structures.
There is the SEQUENCE attribute when you don't want it to reorder
stucture members, though I don't know of any compilers that actually
do reorder them. Then there is the BIND(C) attribute when you want
it done exactly like C.
But in both cases I could lay out the structure when defined
in a way that made the code as self-documenting as possible.
Forcing my ordering, but allowing for indeterminate padding leaves us
somewhere between two stools.

Isn't that usual for any compromise?

-- glen
 
K

Keith Thompson

mathog said:
That's my view too. The standards seem to think that one set of
behavior for struct must fit all, and consequently it fits some common
applications not very well at all. The idea of the "memstruct" I
proposed was that rather than trying to make struct do everything,
create a related construct specifically for handling data in memory,
leaving the regular "struct" to be optimized in the other direction,
and the compiler knowing enough to to copy data back and forth between a
memstruct and a struct, without the programmer having to know about the
alignment, platform hardware, and the other nitty gritty details that
are needed now.
[snip]

How would your proposal handle this on hardware with strict alignment
requirements?

#include <stdio.h>

void func(int *ptr) {
printf("*ptr = %d\n", *ptr);
}

int main(void) {
memstruct foo {
char c;
int i;
};
memstruct foo obj = {'x', 42};
func(&obj.i);
return 0;
}

The problem: the `i` member of `memstruct foo` may be misaligned, but
`func()` has no way of knowing that.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,077
Messages
2,570,566
Members
47,202
Latest member
misc.

Latest Threads

Top