size_t in a struct

B

bwaichu

To avoid padding in structures, where is the best place to put size_t
variables?

According the faq question 2.12 (http://c-faq.com/struct/padding.html),
it says:

"If you're worried about wasted space, you can minimize the effects of
padding by ordering the members of a structure based on their base
types, from largest to smallest."

So if I have the following:

typedef struct _Buffer_t {

char *buffer;
size_t size;

} Buffer_t;

I should have mimized padding since size_t is an unsigned long.
However, will size_t ever become an unsigned long long? If it does,
then size_t still wouldn't be larger than a pointer on the system,
right? But where should I put size_t in relation to other integer
declarations? Should I all ways assume size_t is an unsigned long when
building structures?

Now, if I made another structure:

typedef struct _Buffer2_t {

Buffer_t name;
char *buffer;
size_t size;

} Buffer2_t;

Is the above still the correct sequence to minimize padding?

Thanks.
 
S

Simon Biber

To avoid padding in structures, where is the best place to put size_t
variables?

According the faq question 2.12 (http://c-faq.com/struct/padding.html),
it says:

"If you're worried about wasted space, you can minimize the effects of
padding by ordering the members of a structure based on their base
types, from largest to smallest."

In general this is good advice. But different types have different sizes
on different compilers. If minimising the space is important, you would
have to customise the layout for each compiler you port the code to.
So if I have the following:

typedef struct _Buffer_t {

char *buffer;
size_t size;

} Buffer_t;

I should have mimized padding since size_t is an unsigned long.

Not necessarily! size_t may be equivalent to unsigned short, unsigned
int, unsigned long, unsigned long long or some other
implementation-defined integer type. It may even be longer than unsigned
long long, but no longer than uintmax_t.

In addition, the pointer may be of various different sizes. Testing
three different compilers I happen to have here, I can have 16-bit,
32-bit and 64-bit pointers to char.

In most cases you'll find that char* and size_t have the same size on
any given system. One exception that comes to mind is Turbo C's 'huge'
memory model, where char* is 4 bytes but size_t is 2.
However, will size_t ever become an unsigned long long? If it does,
then size_t still wouldn't be larger than a pointer on the system,
right? But where should I put size_t in relation to other integer
declarations? Should I all ways assume size_t is an unsigned long when
building structures?

If it matters that much to you, you could test the size on the
particular system and generate the required code.
Now, if I made another structure:

typedef struct _Buffer2_t {

Buffer_t name;
char *buffer;
size_t size;

} Buffer2_t;

Is the above still the correct sequence to minimize padding?

A single Buffer2_t struct will probably have the same amount of padding
as two Buffer_t structs, since it is basically just two of the same
thing, one after the other. Putting the 'name' member at the end
probably wouldn't make any difference to the overall size.
 
G

Gordon Burditt

According the faq question 2.12 (http://c-faq.com/struct/padding.html),
it says:

"If you're worried about wasted space, you can minimize the effects of
padding by ordering the members of a structure based on their base
types, from largest to smallest."

I don't agree with this, and I don't think it will work, when
pointers are involved. For example, a short is more likely to be
smaller than a pointer to anything, even char. "pointer" isn't a
base type but things work better here if you assume it is.
So if I have the following:

typedef struct _Buffer_t {

char *buffer;
size_t size;

} Buffer_t;

I should have mimized padding since size_t is an unsigned long.
However, will size_t ever become an unsigned long long? If it does,

Probably. It wouldn't surprise me on an architecture with 64-bit
pointers but an attempt to keep types like long the same as the
older legacy system keeps long at 32 bits.
then size_t still wouldn't be larger than a pointer on the system,
right?

No guarantees, but it's a pretty safe bet. And the penalties for
losing the bet are minimal: structure padding wastes a bit of
memory but no undefined behavior.
But where should I put size_t in relation to other integer
declarations? Should I all ways assume size_t is an unsigned long when
building structures?

You may assume that size_t is slightly larger or slightly smaller
than an unsigned long for the purpose of the ordering. Alternating
size_t and unsigned long would tend to maximize padding if they
aren't the same size.

Assume size_t is slightly larger if you think your code will more
likely run on newer systems where size_t and unsigned long long
might be 64 bit. Assume size_t is slightly smaller than unsigned
long if your code will more likely run on older systems where size_t
might be 16 bits.

There are other places where you have similar problems, such as the
relative size of double and pointer or double and unsigned long
long.

Either way, it is virtually certain you WILL be wrong some of the
time. You can still try for being right a pretty good percentage
of the time.

Now, if I made another structure:

typedef struct _Buffer2_t {

Buffer_t name;
char *buffer;
size_t size;

} Buffer2_t;

Is the above still the correct sequence to minimize padding?

There is no single correct sequence. That one is a reasonable
guess, though.
 
C

Chris Torek

To avoid padding in structures, where is the best place to put size_t
variables?

This is not really predictable, because sizeof(size_t) is not
predictable. In general, though, size_t will be one of unsigned
int, unsigned long, or unsigned long long (it must always be
some unsigned type).
... if I have the following:

typedef struct _Buffer_t {
char *buffer;
size_t size;
} Buffer_t;

In general, you (as a programmer) should avoid names starting
with underscore. (If you are an implementor -- the guy writing
the compiler -- you should use *only* names starting with
underscore, except when you know it is safe or required to
do otherwise.)

In particular, names that start with underscore followed by an
uppercase letter are always reserved to the implementor. *He* gets
to use names like _Buffer_t, so you must avoid them; he does *not*
get to use names like "Buffer", so you can use those. (Names
starting with one underscore and then a lowercase letter are only
"sometimes" reserved to the implementor. It is a lot easier,
though, to just avoid them all.)

Note that you can use the same tag for the structure and the
typedef (if you insist on using typedefs at all). See also
I should have mimized padding since size_t is an unsigned long.
However, will size_t ever become an unsigned long long?

It might.
If it does, then size_t still wouldn't be larger than a pointer
on the system, right?

It might, although this makes less sense in general. Note also
that some pointer types might be larger or smaller than other
pointer types: on old PR1ME machines, for instance, sizeof(char *)
was 6 but sizeof(int *) was 4.
 
B

bwaichu

Simon said:
Not necessarily! size_t may be equivalent to unsigned short, unsigned
int, unsigned long, unsigned long long or some other
implementation-defined integer type. It may even be longer than unsigned
long long, but no longer than uintmax_t.

So standard-wise, how do I handle size_t in structures to minmize
padding? Should size_t all ways follow after pointers in structures?
And if I have integers in the structure, where should I put size_t?

<OT>
According my style(9) man page, the suggestion is to:

"When declaring variables in structures, declare them sorted by use,
then
by size (largest to smallest), then by alphabetical order. "

This would lead to padding, right? But I would have to assume a size
for size_t to follow that style.
</OT>

Thanks.
 
B

bwaichu

Chris said:
In general, you (as a programmer) should avoid names starting
with underscore. (If you are an implementor -- the guy writing
the compiler -- you should use *only* names starting with
underscore, except when you know it is safe or required to
do otherwise.)

In particular, names that start with underscore followed by an
uppercase letter are always reserved to the implementor. *He* gets
to use names like _Buffer_t, so you must avoid them; he does *not*
get to use names like "Buffer", so you can use those. (Names
starting with one underscore and then a lowercase letter are only
"sometimes" reserved to the implementor. It is a lot easier,
though, to just avoid them all.)

Note that you can use the same tag for the structure and the
typedef (if you insist on using typedefs at all). See also
<http://web.torek.net/torek/c/types2.html>.

So if I am writing a set of buffer functions to be used for an
application, and I want
to create my own type to be used throughout the code, I should avoid
using underscores.

Whenever I use typedefs, which is rare, I like to name the structure
the same name as the typedef. So would this be better form:

typedef struct buf {
char *buffer;
size_t size;
} Buf;

Thanks!
 
K

Keith Thompson

So standard-wise, how do I handle size_t in structures to minmize
padding? Should size_t all ways follow after pointers in structures?
And if I have integers in the structure, where should I put size_t?
[...]

The standard doesn't guarantee anything in this area. It doesn't
define the relative sizes of char* and size_t, and it doesn't say much
about the relationship between size and alignment.

My advice: Declare the members in an order that makes sense, and let
the compiler worry about alignment issues.
 
S

Snis Pilbor

<OT>
According my style(9) man page, the suggestion is to:

"When declaring variables in structures, declare them sorted by use,
then
by size (largest to smallest), then by alphabetical order. "

This would lead to padding, right? But I would have to assume a size
for size_t to follow that style.
</OT>

Thanks.

Hmm, this is all very surprising to me: why on earth should the
structure layout depend on the order of the elements in the source? Is
there any reason to do this? Why don't compilers automatically
minimize the size of the structure, that seems like a no-brainer to me?
I can maybe see how it *might* be useful for the compiler to have the
freedom to sacrifice some size to put the most used element at the
beginning of the actual structure layout, just because then whenever
that element is called there's one less addition to perform.. I don't
understand how it could be beneficial to permute elements any further
beyond that though..
 
K

Keith Thompson

Snis Pilbor said:
Hmm, this is all very surprising to me: why on earth should the
structure layout depend on the order of the elements in the source? Is
there any reason to do this? Why don't compilers automatically
minimize the size of the structure, that seems like a no-brainer to me?
I can maybe see how it *might* be useful for the compiler to have the
freedom to sacrifice some size to put the most used element at the
beginning of the actual structure layout, just because then whenever
that element is called there's one less addition to perform.. I don't
understand how it could be beneficial to permute elements any further
beyond that though..

The C standard requires the first member of a structure to be at
offset 0, and each following member to be at a higher offset than the
previous one.

(I personally wouldn't mind if the compiler were allowed to rearrange
structure members arbitrarily, with an optional directive to require
them to be laid out in declared order, but that's not what the
standard says.)
 
K

Keith Thompson

So if I am writing a set of buffer functions to be used for an
application, and I want
to create my own type to be used throughout the code, I should avoid
using underscores.

You should avoid defining identifiers starting with underscores *at
all* (unless you're writing a C implementiaton). There are limited
circumstances in which you can get away with it, but frankly it's
easier to avoid it altogether than to remember the rules.
Whenever I use typedefs, which is rare, I like to name the structure
the same name as the typedef. So would this be better form:

typedef struct buf {
char *buffer;
size_t size;
} Buf;

That's not the same name; "buf" and "Buf" are distinct identifiers.
 
B

bwaichu

Keith said:
That's not the same name; "buf" and "Buf" are distinct identifiers.

I meant the same word. Both are distinct identifiers. But I find
reading code
easier if the words are the same even though the case is different in
typedefs
like quoted above.

I was just using underscores to differentiate the two identifiers,
which I learned today is
not a good idea. I will capitalize as an alternative.
 
K

Keith Thompson

I meant the same word. Both are distinct identifiers. But I find
reading code easier if the words are the same even though the case
is different in typedefs like quoted above.

Ok, but there's no reason to use different identifiers at all. The
following is perfectly legal:

typedef struct buf {
char *buffer;
size_t size;
} buf;

Typedefs and structure tags are in different name spaces, so they
won't conflict even if you use exactly the same identifier for both.

But the typedef isn't even necessary. Personally, I'd prefer just to
declare it as:

struct buf {
char *buffer;
size_t size;
};

and refer to the type as "struct buf". The typedef merely creates a
second name for something that already has a perfectly good name,
saves a little typing (which really isn't much of an advantage), and
hides the fact that the type is a structure (which isn't useful if
you're going to refer to members of the type anyway).
 
F

Frederick Gotham

Keith Thompson posted:
My advice: Declare the members in an order that makes sense, and let
the compiler worry about alignment issues.


Is the compiler not obliged to lay the members out in _exactly_ the order
you specify? For example:

#include <assert.h>

struct Blah {
char *a;
int b;
char c;
float d;
};

int main(void)
{
struct Blah obj;

assert( (char*)&obj.d > (char*)&obj.c );
assert( (char*)&obj.c > (char*)&obj.b );
assert( (char*)&obj.b > (char*)&obj.a );

return 0;
}

Therefore, it seems that we _do_ have to worry about this ourselves, and
not rely on the compiler to make the right decision.
 
K

Keith Thompson

Frederick Gotham said:
Keith Thompson posted:

Is the compiler not obliged to lay the members out in _exactly_ the order
you specify? For example:
[snip]

I started to write an answer to this. Then I remembered that, just
last month, you accused me of "fascism", an insult for which you have
yet to apologize.

If anyone *else* is interested, I'll be glad to discuss struct member
layout.
 
F

Frederick Gotham

Keith Thompson posted:
I started to write an answer to this. Then I remembered that, just
last month, you accused me of "fascism", an insult for which you have
yet to apologize.

If anyone *else* is interested, I'll be glad to discuss struct member
layout.

Ah yes, that whole "char unsigned" business. My memory tends to get a bit
hazy as weeks tick by, but I believe I made such a statement in response to
you labelling my ways as backwards or perverse (I can't remember your exact
wording).

So now you leave me with an ultimatum, "Apologise or I won't talk to you".

I deliberately abstained from taking any further part in the thread in
question because it had decended into nonsense (in my opinion, of course!).

If you'd like to drag up old topics and flog a dead horse, then go ahead,
I'll just ignore the posts which don't interest me.

Meanwhile, I'm going to browse through the latest C-related posts on this
newsgroup.
 
B

Bill Pursell

To avoid padding in structures, where is the best place to put size_t
variables?
But where should I put [a] size_t [member] in relation to other integer
declarations? Should I all ways assume size_t is an unsigned long when
building structures?

One thought is to put a compile time assertion near the definition
of the structure:

CAssert ( sizeof (size_t) > sizeof (int));

There was a recent thread on various ways to
define CAssert.
 
?

=?ISO-8859-1?Q?=22Nils_O=2E_Sel=E5sdal=22?=

Frederick said:
Keith Thompson posted:



Is the compiler not obliged to lay the members out in _exactly_ the order
you specify? For example:

It is. And the compiler worries about the alignment issues, inserting
padding between them if needed.
 
R

Richard Heathfield

Keith Thompson said:
Frederick Gotham said:
Keith Thompson posted:

Is the compiler not obliged to lay the members out in _exactly_ the order
you specify? For example:
[snip]

I started to write an answer to this. Then I remembered that, just
last month, you accused me of "fascism", an insult for which you have
yet to apologize.

I missed that completely. I guess it was deep into a discussion I hadn't
found terribly interesting. Anyway, I have now reviewed the thread, and I
see your point. Gotham owes you an apology. Until then, it's into the bozo
bin with him, I guess.
 
S

Stephen Sprunk

So standard-wise, how do I handle size_t in structures to minmize
padding? Should size_t all ways follow after pointers in structures?
And if I have integers in the structure, where should I put size_t?

It's reasonable to code as if size_t is the same size as void*; it's not
guaranteed to be true, but it's true often enough to use as a rule of
thumb.

There's no practical way to know how to sort size_t or pointers vs other
integer types. It's fair to assume they're at least as large as long on
modern systems and no larger than long long, but they may be smaller
than int on some truly perverse implementations.
<OT>
According my style(9) man page, the suggestion is to:

"When declaring variables in structures, declare them sorted by use,
then by size (largest to smallest), then by alphabetical order. "

This would lead to padding, right? But I would have to assume a size
for size_t to follow that style.
</OT>

This makes more sense than sorting strictly by size; sometimes there are
natural patterns of use that call for certain members to be closer to
the top of the structure. If they are likely to vary in size, try to
group ones of similar size together, but I'd start by grouping things
that tend to be used together or used most often.

S
 
M

Michael Wojcik

There's no practical way to know how to sort size_t or pointers vs other
integer types. It's fair to assume they're at least as large as long on
modern systems and no larger than long long, but they may be smaller
than int on some truly perverse implementations.

I sometimes write code for a "modern" (currently commercially
available, and still selling well, the last I checked) system where
pointers are larger than long long.

But perhaps I'm unfair.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top