Data alignment questin, structures

Ian Collins · Jan 21, 2013

mathog said:
That's my view too. The standards seem to think that one set of
behavior for struct must fit all, and consequently it fits some common
applications not very well at all.

What else do you expect from a generic standard? Compilers are free to
add whatever extensions they like to support platform specific alignment
requirements. Embedded compilers invariably do.

BartC · Jan 21, 2013

How would your proposal handle this on hardware with strict alignment
requirements?

void func(int *ptr) {
printf("*ptr = %d\n", *ptr);
}

int main(void) {
memstruct foo {
char c;
int i;
};
memstruct foo obj = {'x', 42};
func(&obj.i);
return 0;
}

The problem: the `i` member of `memstruct foo` may be misaligned, but
`func()` has no way of knowing that.

It need not get that far. Pointers coulld have a compile-time attribute that
indicates if they might be misaligned. Than passing such a pointer to func()
would be an error (when it would cause the hardware to fail).

Dereferencing such pointers wouldn't be impossible either, but full
interaction with ordinary pointers would complicate the language. (I've seen
gcc do byte-at-a-byte accesses via pointers it thought might be misaligned.)

Philip Lantz · Jan 22, 2013

This doesn't seem like a hard problem, given a definition of the
interface to the network hardware and the implementation-defined
capabilities of the C implementation. Without such definitions, we could
only guess.

You're assuming it's able to read and write network packets using a
protocol that uses 8-bit bytes.

I would be *very* surprised if said 16-bit processors aren't used in
platforms with Ethernet hardware that interoperate with platforms that
send network packets with an odd number of octets. (In other words, I
think it's a valid assumption.)

James Kuyper · Jan 22, 2013

Keith Thompson wrote: ....

I would be *very* surprised if said 16-bit processors aren't used in
platforms with Ethernet hardware that interoperate with platforms that
send network packets with an odd number of octets. (In other words, I
think it's a valid assumption.)

CHAR_BIT==16 would be extremely inconvenient in such environments. As I
understand it, that's why environments that violate such expectations
are the main places where you'll find implementations of C with
CHAR_BIT==16.

mathog · Jan 22, 2013

Keith said:
How would your proposal handle this on hardware with strict alignment
requirements?

#include <stdio.h>

void func(int *ptr) {
printf("*ptr = %d\n", *ptr);
}

int main(void) {
memstruct foo {
char c;
int i;
};
memstruct foo obj = {'x', 42};
func(&obj.i);
return 0;
}

The problem: the `i` member of `memstruct foo` may be misaligned, but
`func()` has no way of knowing that.

memstruct != struct

This is evolving as I go along here. Originally I thought it best
to be able to dereference a memstruct member as "obj.i" in a read
context to mean "copy (probably misaligned) bytes for i into a properly
aligned location which can then be read normally". Since this
destination might be, and in this case probably would be, a register,
the "&obj.i" would throw a compiler warning. However, I'm now thinking
it might be better simply to never allow a direct dereferencing of a
memstruct or one of its members at all, except to the extent it defines
a location in memory that is the location of another memstruct pointer
(or, of course, when it maps data 1:1 to the corresponding normal
struct). In which case in the example above it would still throw a
compiler warning, since it would be trying to pass a (pointer to
memstruct of an int), not a (pointer to an int), and the two are not
compatible. The second method is a bit peculiar in that "obj.i" of a
memstruct is something that has an address but does not have a value per
se. That is exactly what is going on now on some platforms, though,
where a misaligned uint32_t does have an address, but it is not one that
may be dereferenced. The advantage of putting this in the compiler is
that the situation would be obvious on all platforms, whereas on an x86
platform now this condition would not usually even
cause a hiccup.

That is
#include <stdio.h>

struct accumulator {
uint32_t N;
int32_t sumN;
}
struct foo {
uint8_t c;
struct accumulator acc;
uint32_t i;
};

memstruct foo memfoo; /* defines memfoo memstruct type */
memstruct accumulator memaccumulator;

void func(int *ptr) {
printf("*ptr = %d\n", *ptr);
}

void func2(memstruct memacumulator *acc){
struct accumulator anacc;
/* copy data to the corresponding struct. This is
one allowed form of dereferencing */
anacc = *acc;
printf("N:%d sumN:%d\n",anacc.N, anacc.sumN;
/* this is OK, other allowed form of "dereferencing",
actually just used to specify an address. No type
problem with memstruct * to void * */
func3("Location of acc->N = ",&acc->N);
/* this is NOT OK, memstruct * to int * */
func(&acc->N);
}
void func3(char *string, void *ptr) {
printf("%p\n", string, *ptr);
}

int main(void) {
struct foo obj;
memstruct memfoo memobj; /* this form _uses_ memfoo type */
/* fill memobj, equivalent to: memobj = {'x', {3,123}, 42};
obj = memobj; /* copy data memstruct -> struct */
func(&obj.i); /* no issue, this is a normal struct */
/* pass address of acc within this memstruct as
pointer to memstruct memaccumulator. Types match,
so no issue. */
func2(&memobj.acc);
/* Next is not allowed, no dereferencing a memstruct
field directly */
printf("memobj i field:%d\n",memobj.i);
return 0;
}

I see now what some of you were getting at about byte order and floats.
Originally I expected the programmer to have routines to further clean
up obj after it is loaded from memobj, since that is what the current
code which triggered this thread does. But it would make sense for the
compiler to handle all of that that sort of thing
at the

obj = memobj;

line. That is, not just copy position and size in memory, but some
types of value to value conversion as well. That would need a couple of
pragmas, one for the byte order of the data in memory
(Big/End/Other/Native), and one for (IEEE float/not IEEE/Native), with
the defaults being whatever is "Native" for each. In some cases end
user code would
still be needed, but only if "Native" was an odd machine.

Regards,

David Mathog

glen herrmannsfeldt · Jan 22, 2013

(snip)

memstruct != struct

This is evolving as I go along here. Originally I thought it best
to be able to dereference a memstruct member as "obj.i" in a read
context to mean "copy (probably misaligned) bytes for i into a properly
aligned location which can then be read normally". Since this
destination might be, and in this case probably would be, a register,
the "&obj.i" would throw a compiler warning. However, I'm now thinking
it might be better simply to never allow a direct dereferencing of a
memstruct or one of its members at all, except to the extent it defines
a location in memory that is the location of another memstruct pointer
(or, of course, when it maps data 1:1 to the corresponding normal
struct).

How about instead one could write a program that would read the
definition of a memstruct and write out C programs to convert
both directions between a memstruct (stored in an array
of unsigned char) and a struct. It could at the same time do
endian conversion, floating point format conversion, word size
conversion (for machines with different or unusual word size).

Given that you have N machines with possibly different storage
methods (padding, word size, endianness, etc.) there are two
possibilities. One is to write the N**2 routines that will convert
between all pairs of machines. The other is to write 2N routines
that will convert between one intermediate form, from and to each
of the N formats.

If you do that, there is probably an easier to read and write
form than the memstruct. All you have to do is describe them,
generate the C routines, compile and link them, and you are
done!

One implementation of the latter is Sun's XDR, designed along
with RPC, but can be used separately for other kinds of
data transfer problems.

-- glen

Keith Thompson · Jan 22, 2013

mathog said:
memstruct != struct

Perhaps the simplest way to describe it would be that members of a
memstruct act like bit fields. You can access them directly, but you
can't take their addresses.

[snip]

Inserting IPv4 header checksum into dummy IP header	6	Dec 1, 2010
Reading little-endian data from a file in a portable manner	46	Jul 16, 2010
Structures...	1	May 9, 2012
Returning structures from functions.	15	Sep 11, 2008
gcc alignment options	19	Sep 16, 2012
writing uint16_t in a buffer	7	Dec 2, 2008
Serialization Framework	3	Dec 16, 2012
types and conversions	14	May 31, 2010

Data alignment questin, structures

Ian Collins

BartC

Philip Lantz

James Kuyper

mathog

glen herrmannsfeldt

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads