QUERY: field offset rules within structures

  • Thread starter Bradford Chamberlain
  • Start date
B

Bradford Chamberlain

I work a lot with multidimensional arrays of dynamic size in C,
implementing them using a single-dimensional C array and the
appropriate multipliers and offsets to index into it appropriately. I
tend to iterate over subsets of these arrays by walking a pointer and
an offset per dimension across their dimensions. I've found that this
tends to result in more performance in a portable manner than doing
explicit indexing calculations and hoping that the compiler hoists
invariant expressions out of the loops.

For example, rather than doing something like:

{
for (i=ilo; i<ihi; i+=istr) {
for (j=jlo; j<jhil j+=jstr) {
... a[(i-ailo)*imult + (j-ajlo)*jmult] ...
}
}
}

I would tend to do:

{
double* aptr = ...;
const int abumpj = ...;
const int abumpi = ...;

for (i=ilo; i<ihi; i+=istr) {
for (j=jlo; j<jhil j+=jstr) {
... *aptr ...

aptr += abumpj;
}
aptr += abumpi;
}
}

My question is motivated by the following desire: If I have an array
of structures, I'd like to be able to walk around *a single field*
within the array in a similar manner. For example, given the
structure:

struct _mystruct {
short x;
double y;
char z;
}

I would like the ability to walk through the "y" fields of the array
using a pointer and offsets.

I've been able to achieve this using a char* pointer as follows:

{
char* xptr = ...; /* point to y field of the initial element */
const int xbumpj = ...; /* express in bytes rather than */
const int xbumpi = ...; /* sizeof(doubles) */

for (i=ilo; i<ihi; i+=istr) {
for (j=jlo; j<jhil j+=jstr) {
... *((double*)xptr) ...

xptr += xbumpj;
}
xptr += xbumpi;
}
}

I'd really prefer to eliminate the reliance on char* pointers and
pointer casting, however. In particular, I'd like to make xptr a
double*.

My question is therefore the following: does C have any requirements
on structure layout that would ensure that walking from field to field
within an array of structures will be evenly divisible by sizeof()
that field's type?

If not, it seems like my hope is in vain because the additive offsets
will be in terms of sizeof(<field type>) rather than bytes (or I will
have to cast my pointer back to char* before adding a generic offset
to it).

If so, I'm off and running. However, it's hard for me to believe that
C would ensure such rules on padding, which is why I ask.

The only other approach I can think of is to walk a structure pointer
over my array's elements and then dereference the pointer to access
the field. I guess I've avoided this approach simply due to the fact
that it seems more expensive (due to the field offset calculation
involved).

Does anyone have any other suggstions for how I might write such code
without relying on char* pointers?

Thanks,
-Brad
 
T

Tak-Shing Chan

The only other approach I can think of is to walk a structure pointer
over my array's elements and then dereference the pointer to access
the field. I guess I've avoided this approach simply due to the fact
that it seems more expensive (due to the field offset calculation
involved).

This is the right approach. The char * approach produces
undefined behavior.

Tak-Shing
 
E

Eric Sosman

Bradford said:
[...]
My question is motivated by the following desire: If I have an array
of structures, I'd like to be able to walk around *a single field*
within the array in a similar manner. For example, given the
structure:

struct _mystruct {
short x;
double y;
char z;
}

I would like the ability to walk through the "y" fields of the array
using a pointer and offsets.

[...]

My question is therefore the following: does C have any requirements
on structure layout that would ensure that walking from field to field
within an array of structures will be evenly divisible by sizeof()
that field's type?

No. This would be equivalent to requiring that the
sizeof a struct be a multiple of the sizeof all the elements,
and there is no such requirement. An example will indicate
why such a requirement would be a poor idea:

struct {
char a[2], b[3], c[5], d[7], e[11];
char f[13], g[17], h[19], i[23], j[29];
} x;

If `sizeof x' had to be a multiple of all of `sizeof x.a'
through `sizeof x.j', it's easy to see that it would have
to be a multiple of 2*3*...*29 = 6469693230. Don't you
think it would be a wee bit wasteful to add more than six
gigabytes of padding to a payload of 2+3+...+29 = 129
bytes of actual data?

What you *can* rely on is that the sizeof a struct is
a multiple of the strictest alignment required of any of its
elements. For example, some C implementations use an eight-
byte `double' but only require it to be aligned to a four-
byte boundary. On such an implementation, the struct

struct { char c; double d; } y[2];

.... might well occupy twelve bytes. This suffices to align
the `d' elements in an array of such structs, but it's obvious
that the twelve-byte "step" from y[0].d to y[1].d is not a
multiple of eight.
The only other approach I can think of is to walk a structure pointer
over my array's elements and then dereference the pointer to access
the field. I guess I've avoided this approach simply due to the fact
that it seems more expensive (due to the field offset calculation
involved).

That's the clean way to do it. You'll have to measure the
performance consequences and decide whether they're important
or not in your situation. If the performance is unacceptable,
one alternative would be to "decompose" your structs into
parallel arrays. That is, instead of your example of

struct _mystruct {
short x;
double y;
char z;
}

.... presumably augmented with `struct _mystruct *p' and the
like, you'd instead have

short *my_x;
double *my_y;
char *my_z;

.... and access the three arrays independently. Depending on
what you're doing, this could either improve or disimprove
the running time.
 
A

Arthur J. O'Dwyer

My question is motivated by the following desire: If I have an array
of structures, I'd like to be able to walk around *a single field*
within the array in a similar manner. For example, given the
structure:

struct _mystruct {
short x;
double y;
char z;
}

I would like the ability to walk through the "y" fields of the array
using a pointer and offsets.
I'd really prefer to eliminate the reliance on char* pointers and
pointer casting, however. In particular, I'd like to make xptr a
double*.
Does anyone have any other suggstions for how I might write such code
without relying on char* pointers?

Obviously the other answers you've received are far better than this
"suggestion," so please don't use it in production code; but I see no
reason that this [untested code] wouldn't work.


extern void process(double);
extern struct _mystruct my_array[42];
extern int num_structs;

int i;
double *p = &(my_array->d);
for (i=0; i < num_structs; ++i)
{
process(*p);
p = (double *) ((unsigned char *)p + sizeof(struct _mystruct));
}


Remember: Don't Do This! :)
-Arthur
 
B

Bradford Chamberlain

Thanks to everyone for their answers. It was gratifying to get so
many responses so quickly. Having spent another night mulling this
over, I have some more thoughts/questions, and will phrase them in
terms of Arthur's code:


Arthur J. O'Dwyer said:
Obviously the other answers you've received are far better than this
"suggestion," so please don't use it in production code; but I see no
reason that this [untested code] wouldn't work.


extern void process(double);
extern struct _mystruct my_array[42];
extern int num_structs;

int i;
double *p = &(my_array->d);
for (i=0; i < num_structs; ++i)
{
process(*p);
p = (double *) ((unsigned char *)p + sizeof(struct _mystruct));
}


This is very similar to my original approach, except that you're
storing the pointer as a double* and then casting it as a char* to
move it (whereas I always stored as a char* and cast as a double* to
access it...). Unfortunately the cast to char* doesn't satisfy my
query since I was hoping to get rid of all references to char*...

That said, I'm in the same boat as you: I don't see why this type of
approach would not work. Tak-Shing Chan's response seems to assert
that such approaches aren't guaranteed to work (though in fact, in
practice it has on the dozens of platforms I've used, from desktop
machines to supercomputers). Could someone give a less elusive
explanation of why it isn't guaranteed to work? In particular, I'd be
curious to know why such an approach might fail due to the language
standard, as well as what would cause it to fail in practice with
actual compilers (since it never has that I've observed).


For those interested in continuing to pursue this conversation, I've
also remembered a second reason why I was uncomfortable with the
solution that walked a "pointer-to-struct" through memory,
dereferencing it to access a field at a time. The situation requires
that you understand that I have a "multidimensional array descriptor"
of sorts, a simplified version of which appears below:

struct _array {
void* data; /* pointer to data buffer */
int numdims; /* number of conceptual dimensions */
int block[MAXDIM]; /* per-dim multipliers in bytes */
} array;

(the real descriptor has additional fields supporting non-0 origins,
strided arrays, etc.)

For example, to initialize an n x n array of doubles, I would do
something like the following:

array mydblarray;

mydblarray.data = malloc(n*n*sizeof(double));
mydblarray.numdims = 2;
mydblarray.block[0] = sizeof(double);
mydblarray.block[1] = n * mydblarray.block[0];

Then, I can create functions that operate on these arrays. For
example, the following code zeros out the first k x k elements in a 2D
array of doubles (assuming the array size is >= k in each dim).

void zero_kxk_dbl_arr(array* dblarr, int k) {
int i, j;
char* wlk = dblarr->data;
bump_j = dblarr->block[0];
bump_i = dblarr->block[1] - (k*(dblarr->block[0]));

for (i=0; i<k; i++) {
for (j=0; j<k; j++) {
*(double*)wlk = 0.0;

wlk += bump_j;
}
wlk += bump_i;
}
}

Now, the nice thing about this approach is that I can resuse this code
for a double field in an array of structs by fiddling with the array
descriptor slightly. For example:

/* a struct with a double field */
typedef struct _mystruct {
short x;
double y;
} mystruct;

/* intialize an array of structs */
array mystructarray;
mystructarray.data = malloc(n*n*sizeof(mystruct));
mystructarray.numdims = 2;
mystructarray.block[0] = sizeof(mystruct);
mystructarray.block[1] = n * mystructarray.block[0];

/* spoof array using y field from mystructarray */
array tmparray = mystructarray
tmparray.data += CALC_OFFSET(y);

/* pass spoofed descriptor in as loosely strided double array */
zero_kxk_dbl_arr(tmparray, 5);

This has been a nice property for code reuse, in my experience
(especially given the fact that I'm not writing all of this code by
hand, but am generating it automatically from a higher-level array
language). Yet it relies on the char* approach since (as we learned
yesterday) we can't make assumptions about the distance between fields
in two adjacent structs. Furthermore, in this case, I can't use a
struct walker since the context in which the struct's field is being
utilized is unaware of the struct's existence.

So I guess my questions here are two: (1) Does anyone have other
suggestions for how to approach this without char*s, yet without
giving away the code reuse flexibility that they result in? and (2)
I'm sure someone will tell me that what I'm doing is illegal in C, and
I'm willing to believe that but I don't understand it. Why is it?
(probably same reason as Arthur's code is). Again, I'd be curious to
hear both why it's illegal according to the standard even though it
generally seems to work in practice, as well as how it would fail in
practice on real compilers (since I haven't observed that anywhere).

Thanks again for all the answers, I really do appreciate the information
I've been getting,
-Brad
 
A

Arthur J. O'Dwyer

Arthur J. O'Dwyer said:
Obviously the other answers you've received are far better than this
"suggestion," so please don't use it in production code; but I see no
reason that this [untested code] wouldn't work.

Clarification: The below code is, to the best of my knowledge,
*completely* legal C and C++. There is nothing wrong with it
from the point of view of the compiler. I just meant that (1)
it's fugly; and (2) I hadn't actually compiled it myself yet.
Don't do this because it's a PITA to maintain, not because it's
not portable or anything (IOW, it *is* portable).
This is very similar to my original approach, except that you're
storing the pointer as a double* and then casting it as a char* to

Nitpick: an **unsigned** char *, which is guaranteed to be able to
represent the value. I don't know about plain char, and I don't ever
expect to need to know, because I use 'unsigned char' for hacks like
this exclusively.
move it (whereas I always stored as a char* and cast as a double* to
access it...). Unfortunately the cast to char* doesn't satisfy my
query since I was hoping to get rid of all references to char*...
Why?

That said, I'm in the same boat as you: I don't see why this type of
approach would not work. Tak-Shing Chan's response seems to assert
that such approaches aren't guaranteed to work

I don't know what Tak-Shing Chan was referring to with his very vague
comment about leading to UB. Maybe he was referring to the fact that
you can't reliably cast a 'struct _mystruct *' to a 'double *', or vice
versa, whether you pass them through 'char *' first or not.
(though in fact, in
practice it has on the dozens of platforms I've used, from desktop
machines to supercomputers). Could someone give a less elusive
explanation of why it isn't guaranteed to work? In particular, I'd be
curious to know why such an approach might fail due to the language
standard, as well as what would cause it to fail in practice with
actual compilers (since it never has that I've observed).

What *will* fail is what you originally asked about: whether the internals
of structures were always aligned so that the 'double' fields of adjacent
structures were always a multiple of 'sizeof(double)' bytes apart. This
is not true. Consider the made-up example:

struct foo
{
double d; /* suppose 10-byte doubles on this machine */
/* and 2 bytes of structure padding */
int i; /* and 4-byte ints */
} a[2];

Now a[0].d and a[1].d are both doubles, and (a+0)->d and (a+1)->d are
both pointers to double, but the expression

&(a+1)->d - &(a+0)->d

yields undefined behavior. As should be obvious, since the two objects
are 16 bytes apart, which isn't a multiple of [sizeof(double)=] 10.

So trying to simplistically "step through" the array is doomed to
failure. You need to account for the fact that each 'double' is
part of a larger 'struct _mystruct'.
For those interested in continuing to pursue this conversation, I've
also remembered a second reason why I was uncomfortable with the
solution that walked a "pointer-to-struct" through memory,
dereferencing it to access a field at a time. The situation requires
that you understand that I have a "multidimensional array descriptor"
of sorts, a simplified version of which appears below:

This code is incredibly yucky. Or so it looks to me right now.
I'm not going to try to comprehend it, except to note that if you
want bounds-checked array types (which is the only use I see for this
sort of thing), C++ already has them. Besides, if you're going for
an ADT sort of thing, then you should really supply some accessor
functions to do your dereferencing and indexing, and then write your
main code using those accessors. YMMV.
For example, to initialize an n x n array of doubles, I would do
something like the following:

array mydblarray;

mydblarray.data = malloc(n*n*sizeof(double));
mydblarray.numdims = 2;
mydblarray.block[0] = sizeof(double);
mydblarray.block[1] = n * mydblarray.block[0];

See how yucky that is? To initialize an instance of *my* imaginary
bounds-checked array type, I'd write

array *mydblarray;
mydblarray = new_adtarray(sizeof(double), 2,n,n);
if (!mydblarray) exit(EXIT_FAILURE);

and use callbacks like those passed to qsort() in order to do
complicated things with whole matrices. N.B. that you omitted
to check the return value of malloc() in the snippet above.
What with duplicating that yucky code every time you need to
create a matrix, you're going to run into silly mistakes like
that a *lot*.
This has been a nice property for code reuse, in my experience
(especially given the fact that I'm not writing all of this code by
hand, but am generating it automatically from a higher-level array
language). Yet it relies on the char* approach since (as we learned
yesterday) we can't make assumptions about the distance between fields
in two adjacent structs. Furthermore, in this case, I can't use a
struct walker since the context in which the struct's field is being
utilized is unaware of the struct's existence.

Pass another argument to the struct-walking function giving the sizeof
the struct in bytes. Then use the (unsigned char *) method I outlined
upthread. Or, probably better, re-write the core array library so it is
more user-friendly, even at the expense of a few milliseconds per run.
So I guess my questions here are two: (1) Does anyone have other
suggestions for how to approach this without char*s, yet without
giving away the code reuse flexibility that they result in?

Template programming. But that's a different language. C uses
pointers to void and pointers to unsigned char for genericity; it's
just the way it works (TM).
and (2)
I'm sure someone will tell me that what I'm doing is illegal in C, and
I'm willing to believe that but I don't understand it. Why is it?
(probably same reason as Arthur's code is).

Mine isn't. I didn't look closely at yours, but it probably isn't
either.
Thanks again for all the answers, I really do appreciate the information
I've been getting,
-Brad

You're welcome.
-Arthur
 
E

Eric Sosman

Bradford said:
[ lots more than has survived my brutal snippage ]

The situation requires
that you understand that I have a "multidimensional array descriptor"
of sorts, a simplified version of which appears below:

struct _array {
void* data; /* pointer to data buffer */
int numdims; /* number of conceptual dimensions */
int block[MAXDIM]; /* per-dim multipliers in bytes */
} array;

[...]

Now, the nice thing about this approach is that I can resuse this code
for a double field in an array of structs by fiddling with the array
descriptor slightly. For example:

/* a struct with a double field */
typedef struct _mystruct {
short x;
double y;
} mystruct;

/* intialize an array of structs */
array mystructarray;
mystructarray.data = malloc(n*n*sizeof(mystruct));
mystructarray.numdims = 2;
mystructarray.block[0] = sizeof(mystruct);
mystructarray.block[1] = n * mystructarray.block[0];

/* spoof array using y field from mystructarray */
array tmparray = mystructarray
tmparray.data += CALC_OFFSET(y);

Bzzt! The `data' element is a `void*', and pointer
arithmetic isn't defined on `void*'. The gcc compiler (and
others, perhaps) treats `void*' as synonymous with `char*'
for this purpose, but that's a non-conforming extension to
the language. Some conforming alternatives:

- tmparray.data = (char*)tmparray.data + CALC_OFFSET(y);
- Declare `data' as `char*' instead of `void*'
- Declare `data' as `double*' instead of `void*', and
make adjustments to the pointer arithmetic as in
Arthur O'Dwyer's post.

[...]
So I guess my questions here are two: (1) Does anyone have other
suggestions for how to approach this without char*s, yet without
giving away the code reuse flexibility that they result in? and (2)
I'm sure someone will tell me that what I'm doing is illegal in C, and
I'm willing to believe that but I don't understand it. Why is it?
(probably same reason as Arthur's code is). Again, I'd be curious to
hear both why it's illegal according to the standard even though it
generally seems to work in practice, as well as how it would fail in
practice on real compilers (since I haven't observed that anywhere).

For (1), I don't understand why you're so anxious about using
`char*'. It's quite a natural thing to do when you find yourself
laying things out in storage -- a C object's representation is
defined as an array of characters, so why not use a character
pointer? My usual advice is to forget about representations and
concentrate on values, but this is one of those places where the
representation comes to the fore -- and `char*' is a perfectly
appropriate handle for a representation.

As for (2), I see nothing illegal with what you're doing.
The eventual conversion to `double*' (snipped; see up-thread)
will succeed, because you're correctly computing the "stride"
between successive elements and they're all properly aligned
as `double' values should be. You could bullet-proof things
a bit by making `block' an array of `size_t' instead of an
array of `int', but even that won't be a problem if your arrays
are of moderate size. Perhaps Tak-Shing spotted some other
problem that's eluded me (it wouldn't be the first time ...).
 
L

lawrence.jones

Bradford Chamberlain said:
That said, I'm in the same boat as you: I don't see why this type of
approach would not work. Tak-Shing Chan's response seems to assert
that such approaches aren't guaranteed to work (though in fact, in
practice it has on the dozens of platforms I've used, from desktop
machines to supercomputers). Could someone give a less elusive
explanation of why it isn't guaranteed to work? In particular, I'd be
curious to know why such an approach might fail due to the language
standard, as well as what would cause it to fail in practice with
actual compilers (since it never has that I've observed).

Consider an implementation where doubles are 8 bytes long, but only need
to be aligned on 4 byte boundaries. Now consider an array of structs
like:

struct {
int n;
double d;
} a[10];

Now the size of a[0] is 12, as is (char *)&a[1].d - (char *)a[0].d,
which is 1.5 doubles, making it impossible to move a double pointer from
one to the next without an intermediate cast to some other type. Such
machines undoubtedly exist, although I can't name one off the top of my
head. Machines where doubles can be on any byte boundary are legion,
including the x86 and the Vax.

-Larry Jones

Oh, what the heck. I'll do it. -- Calvin
 
B

Bradford Chamberlain

Eric Sosman said:
Bzzt! The `data' element is a `void*', and pointer
arithmetic isn't defined on `void*'. The gcc compiler (and
others, perhaps) treats `void*' as synonymous with `char*'
for this purpose, but that's a non-conforming extension to
the language.

Good point -- this was me typing code off the top of my head and not
compiling it to check it. But you understood what I meant, so that's
the most important thing.

For (1), I don't understand why you're so anxious about using
`char*'.

I was hoping to avoid complicating my already-too-long post with any
justification. :) But since you asked...

The reason is simply that I'm using a parallelizing compiler that
doesn't do so well in the presence of char*'s or char* casts. Its
alignment rules are designed such that the answer to the original
question (will the fields of successive records in an array be spaced
by a multiple of the field type?) happens to be "yes." I was worried
that relying on this rule would result in a lack of portability to
more conventional machines/compilers, which resulted in my original
post (and learned that my worries were justified).

My sense at this point is that I'm going to have to generate two
versions of the code: one using the char* technique which a growing
number of people agree is legal (pending explanation from Tak-Shing),
and a second which takes advantage of this particular compiler's
alignment rules to get around its char* allergy.

Thanks for the opinions and backup, Arthur and Eric.

As far as the "wow, your data structures are gross" and "boy, this is
tricky stuff to get right" issues go, the saving graces and motivation
are: (1) All this code I'm referring to is compiler-generated, so it's
not designed to be utilized by a user as an ADT; (2) What I put in my
postings was just a stripped-down version to get to my questions as
quickly as possible -- the real deal is much more complicated, for
better or worse; (3) performance is king because this code generation
is for a scientific computing language, so I really am trying to
squeeze portable performance out at the risk of making the code hard
to read or write for a human.

Thanks again for all the responses,
-Brad
 
L

lawrence.jones

Bradford Chamberlain said:
The reason is simply that I'm using a parallelizing compiler that
doesn't do so well in the presence of char*'s or char* casts. Its
alignment rules are designed such that the answer to the original
question (will the fields of successive records in an array be spaced
by a multiple of the field type?) happens to be "yes."

I sincerely doubt that that's true. It may well be true for the
particular types you're interested in, but not for all types. Consider:

struct {
int a;
struct { char b[200]; } c;
} d[10];

I'm pretty sure that c won't be aligned on a 200 byte boundary (which
would require each element of d to be 400 bytes long).

-Larry Jones

I sure wish I could get my hands on some REAL dynamite. -- Calvin
 
B

Bradford Chamberlain

alignment rules are designed such that the answer to the original
question (will the fields of successive records in an array be spaced
by a multiple of the field type?) happens to be "yes."

I sincerely doubt that that's true. It may well be true for the
particular types you're interested in, but not for all types. Consider:

struct {
int a;
struct { char b[200]; } c;
} d[10];

I'm pretty sure that c won't be aligned on a 200 byte boundary (which
would require each element of d to be 400 bytes long).

You're right, I had oversimplified the rule in my mind to deal with
the scalar field types I was most interested in (in particular, I
don't tend to use char fields much). Your point is a good one: if I
typedef char ch200[200] and replace the c declaration above with
ch200, then I can't walk a ch200 pointer from one c field to the next
without casting it to a pointer to a smaller type. Thanks for
pointing this out, it's a subtle point that I probably would've
glossed over otherwise.

-Brad
 
B

Barry Schwarz

I work a lot with multidimensional arrays of dynamic size in C,
implementing them using a single-dimensional C array and the
appropriate multipliers and offsets to index into it appropriately. I
tend to iterate over subsets of these arrays by walking a pointer and
an offset per dimension across their dimensions. I've found that this
tends to result in more performance in a portable manner than doing
explicit indexing calculations and hoping that the compiler hoists
invariant expressions out of the loops.

For example, rather than doing something like:

{
for (i=ilo; i<ihi; i+=istr) {
for (j=jlo; j<jhil j+=jstr) {
... a[(i-ailo)*imult + (j-ajlo)*jmult] ...
}
}
}

I would tend to do:

{
double* aptr = ...;
const int abumpj = ...;
const int abumpi = ...;

for (i=ilo; i<ihi; i+=istr) {
for (j=jlo; j<jhil j+=jstr) {
... *aptr ...

aptr += abumpj;
}
aptr += abumpi;
}
}

My question is motivated by the following desire: If I have an array
of structures, I'd like to be able to walk around *a single field*
within the array in a similar manner. For example, given the
structure:

struct _mystruct {
short x;
double y;
char z;
}

I would like the ability to walk through the "y" fields of the array
using a pointer and offsets.

I've been able to achieve this using a char* pointer as follows:

{
char* xptr = ...; /* point to y field of the initial element */
const int xbumpj = ...; /* express in bytes rather than */
const int xbumpi = ...; /* sizeof(doubles) */

for (i=ilo; i<ihi; i+=istr) {
for (j=jlo; j<jhil j+=jstr) {
... *((double*)xptr) ...

xptr += xbumpj;
}
xptr += xbumpi;
}
}

I'd really prefer to eliminate the reliance on char* pointers and
pointer casting, however. In particular, I'd like to make xptr a
double*.

My question is therefore the following: does C have any requirements
on structure layout that would ensure that walking from field to field
within an array of structures will be evenly divisible by sizeof()
that field's type?

If not, it seems like my hope is in vain because the additive offsets
will be in terms of sizeof(<field type>) rather than bytes (or I will
have to cast my pointer back to char* before adding a generic offset
to it).

If so, I'm off and running. However, it's hard for me to believe that
C would ensure such rules on padding, which is why I ask.

The only other approach I can think of is to walk a structure pointer
over my array's elements and then dereference the pointer to access
the field. I guess I've avoided this approach simply due to the fact
that it seems more expensive (due to the field offset calculation
involved).

Does anyone have any other suggstions for how I might write such code
without relying on char* pointers?
If you have a struct STRUCT that contains a member (either scalar or
aggregate) of type TYPE and if the alignment for TYPE is the same as
its size (e.g., a double is of size 8 and must be aligned on an 8-byte
boundary), then sizeof(struct STRUCT) is guaranteed to be a multiple
of sizeof(TYPE). Consider

struct STRUCT{
...
TYPE member; /* TYPE could be double to match you example */
...
}
struct STRUCT mystruct[2];
intptr_t is0, is1, im0, im1;
is0 = (intptr_t)&mystruct[0];
is1 = (intptr_t)&mystruct[1];
im0 = (intptr_t)&mystruct[0].member;
im1 = (intptr_t)&mystruct[1].member;

is1-is0 is the number of characters between the start of
successive elements of the array which is, by definition,
sizeof(struct STRUCT).
im0-is0 is the number of characters from the start of the struct
to the start of member which is, by definition, offsetof(...).
im1-is1 is the same.
im0 == is0 + offsetof(...).
im1 == is1 + offsetof(...).
im1-im0 == is1 + offsetof(...) - is0 - offsetof(...)
== is1-is0
== sizeof(struct STRUCT)
im0 is a multiple of sizeof(TYPE) and can be represented as
k0*sizeof(TYPE) for some integer k0.
Similarly, im1 can be represented as k1*sizeof(TYPE).
im1-im0 == (k1-k0)*sizeof(TYPE) == sizeof(struct STRUCT).
(k1-k0) == sizeof(struct STRUCT)/sizeof(TYPE) which is the ratio
of two constants and therefore constant.

Consequently, if you want to step through a particular member for each
element of an array of struct, you can compute the quotient once as
part of program initialization and use it as the increment for your
pointer arithmetic, something like:

TYPE *ptr;
int dim, ratio;
...
ratio = sizeof(struct STRUCT) / sizeof(TYPE);
...
for (dim=0, ptr=&mystruct[0].member;
dim < max_dimension_of_array;
dim++; ptr+=ratio){
...
}

This will not work if TYPE's alignment is different from its size.
For example, a double has size 8 but alignment 4. Then struct x{int
i1; double d1;} could easily have size 12 which is not a multiple of
8.

Unfortunately, I know of no portable way to test a type's alignment
requirement.


<<Remove the del for email>>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top