Zero length array declaration

R

RS

Hi,

Looking to see if the following construct is valid:
typedef struct {
int foo;
char bar[0];
} foobar;

Basically, the idea is to have the structure above point to a message
buffer that has a 4-byte integer followed by a stream of variable
number of bytes. The structure member "bar" is used to reference the
stream of bytes. The above code compiles fine with a GNU based PPC
cross-compiler running on Solaris and seems to do the function
intended.

My question is: Is it legal to declare an array with zero length? Or
should bar have been declared to be at least one element in length?

Comments appreciated.

RS
 
E

Eric Sosman

RS said:
Hi,

Looking to see if the following construct is valid:
typedef struct {
int foo;
char bar[0];
} foobar;

Basically, the idea is to have the structure above point to a message
buffer that has a 4-byte integer followed by a stream of variable
number of bytes. The structure member "bar" is used to reference the
stream of bytes. The above code compiles fine with a GNU based PPC
cross-compiler running on Solaris and seems to do the function
intended.

My question is: Is it legal to declare an array with zero length? Or
should bar have been declared to be at least one element in length?

The construct as written is not legal C under either
version of the Standard, although some compilers may allow
it as an extension to the language.

Declaring `bar' as a one-element array is and always has
been legal. However, allocating extra memory for the struct
and then using `bar' as if it had more than one element is
not. This particular abuse (known as "the struct hack") works
on the great majority of C implementations, but is not actually
legitimate.

The latest "C99" version of the Standard legitimizes the
struct hack, but introduces a new syntax: you declare `bar'
with no array size at all, as `char bar[]'. However, this is
still relatively new and not yet widely supported by available
compilers.
 
J

jjr2004a

Hi,
Looking to see if the following construct is valid:
typedef struct {
int foo;
char bar[0];
} foobar;
I not sure about the legality of this - others can answer that.

For the stated purpose this is wrong.

char bar[0] should be char *bar instead. That's what a
pointer is for - to refer or point to some storage located
somewhere else.
 
K

Keith Thompson

Looking to see if the following construct is valid:
typedef struct {
int foo;
char bar[0];
} foobar;
I not sure about the legality of this - others can answer that.

For the stated purpose this is wrong.

char bar[0] should be char *bar instead. That's what a
pointer is for - to refer or point to some storage located
somewhere else.

No, the point of declaring bar as an array is that the array (of some
dynamic size) is stored within the struct itself. The structure is
allocated by calling malloc with a size equal to the size of the
structure itself plus the number of characters to be stored in the
array. It's called the "struct hack"; it's commonly supported, and
commonly used, but not strictly legal.

Declaring "char *bar" is of course valid as well, but it's a different
thing, requiring a separate memory allocation (and later deallocation)
for the array.

As others have said elsethread, C99 adds support for the struct hack,
but with a new syntax.
 
M

Method Man

RS said:
Hi,

Looking to see if the following construct is valid:
typedef struct {
int foo;
char bar[0];
} foobar;

Basically, the idea is to have the structure above point to a message
buffer that has a 4-byte integer followed by a stream of variable
number of bytes. The structure member "bar" is used to reference the
stream of bytes. The above code compiles fine with a GNU based PPC
cross-compiler running on Solaris and seems to do the function
intended.

My question is: Is it legal to declare an array with zero length? Or
should bar have been declared to be at least one element in length?

Comments appreciated.

RS

Strictly speaking, accessing an array beyond its bounds results in undefined
behaviour according to the Standard.
 
K

Keith Thompson

Method Man said:
RS said:
Hi,

Looking to see if the following construct is valid:
typedef struct {
int foo;
char bar[0];
} foobar;

Basically, the idea is to have the structure above point to a message
buffer that has a 4-byte integer followed by a stream of variable
number of bytes. The structure member "bar" is used to reference the
stream of bytes. The above code compiles fine with a GNU based PPC
cross-compiler running on Solaris and seems to do the function
intended.

My question is: Is it legal to declare an array with zero length? Or
should bar have been declared to be at least one element in length?

Comments appreciated.

RS

Strictly speaking, accessing an array beyond its bounds results in undefined
behaviour according to the Standard.

Strictly speaking, declaring a 0-sized array is illegal. Some
compilers may let you get away with it. Compilers that disallow
0-sized arrays may let you implement the struct hack with
char bar[1];
The declaration is legal, but accessing elements beyond bar[0] invokes
undefined behavior (which, if you've allocated enough memory, is
likely to work aryway).
 
M

Mabden

Keith Thompson said:
Method Man said:
RS said:
Hi,

Looking to see if the following construct is valid:
typedef struct {
int foo;
char bar[0];
} foobar;

Basically, the idea is to have the structure above point to a message
buffer that has a 4-byte integer followed by a stream of variable
number of bytes. The structure member "bar" is used to reference the
stream of bytes. The above code compiles fine with a GNU based PPC
cross-compiler running on Solaris and seems to do the function
intended.

If you want "to have the structure above point to a message buffer", you
should use a pointer to a buffer containing a message. This buffer could
be an array of strings, for instance.
Strictly speaking, accessing an array beyond its bounds results in undefined
behaviour according to the Standard.

Strictly speaking, declaring a 0-sized array is illegal. Some
compilers may let you get away with it. Compilers that disallow
0-sized arrays may let you implement the struct hack with
char bar[1];
The declaration is legal, but accessing elements beyond bar[0] invokes
undefined behavior (which, if you've allocated enough memory, is
likely to work aryway).

Especially if "aryway" means "not very well". I don't understand why you
would even pursue such a line of reasoning when all the OP wants (read:
needs) is to use a pointer. His post indicated what he wanted, so why go
off on unsupported tangents that imply acceptance for arrays of zero
length or whatever. It's a silly thing to run a thread on.
 
F

Flash Gordon

Keith Thompson said:
Method Man said:
Hi,

Looking to see if the following construct is valid:
typedef struct {
int foo;
char bar[0];
} foobar;

Basically, the idea is to have the structure above point to a message
buffer that has a 4-byte integer followed by a stream of variable
number of bytes. The structure member "bar" is used to reference the
stream of bytes. The above code compiles fine with a GNU based
PPC> cross-compiler running on Solaris and seems to do the function
intended.

If you want "to have the structure above point to a message buffer",
you should use a pointer to a buffer containing a message. This buffer
could be an array of strings, for instance.

He does *not* want the structure to point to a buffer containing the
message. The int foo is almost certainly part of the message comming
over a communications link followed by a number of bytes. For instance,
the bytes recieved might be
int 5
byte 1
byte 2
byte 3
byte 4

To do what you suggest the routine receiving this would first have to
allocate memory for the "header" integer and a pointer, allocate another
block for the rest of the message, set up the pointer and also change
where it is sending the data to etc. Also, the entire lot might be
comming in as one block of data so your solution could also meen copying
the data around even more.
My question is: Is it legal to declare an array with zero length? Or
should bar have been declared to be at least one element in
length?>

Strictly speaking, accessing an array beyond its bounds results in undefined
behaviour according to the Standard.

Strictly speaking, declaring a 0-sized array is illegal. Some
compilers may let you get away with it. Compilers that disallow
0-sized arrays may let you implement the struct hack with
char bar[1];
The declaration is legal, but accessing elements beyond bar[0]
invokes undefined behavior (which, if you've allocated enough
memory, is likely to work aryway).

Especially if "aryway" means "not very well". I don't understand why
you would even pursue such a line of reasoning when all the OP wants
(read: needs) is to use a pointer. His post indicated what he wanted,
so why go off on unsupported tangents that imply acceptance for arrays
of zero length or whatever. It's a silly thing to run a thread on.

No, Keith knows EXACTLY what he is tlking about and the OP was taking
exactly the right approach. It is just unfortunate that the C89 standard
did not bless a way of doing it and that the C99 standard did not
include support for [0] and/or the [1] struct hack which had the benefit
of existing practice.

Also, although not covered by the ISO standard it is a very commonly
supported extension. I would even go as far as to reject a compiler on
the basis of it not supporting this and instead select one where it was
supported, where it not for the fact I haven't come across a compiler
where it fails.
 
S

Spiro Trikaliotis

Hello,

Keith Thompson said:
Declaring "char *bar" is of course valid as well, but it's a different
thing, requiring a separate memory allocation (and later deallocation)
for the array.

Well, a separate memory allocation is not needed. You can alloc
sizeof(struct) + extra bytes, and then have bar point to "after the
struct".

Or is there anything else in the standard that forbids this?

BTW: A good usage for now using a pointer is IMHO if you use shared
memory between processes, when the shared memory can be mapped to
different memory locations on every process. A pointer is quite useless
in this scenario, but some indices to the char bar[] array are not.

Regards,
Spiro.
 
C

Chris Croughton

Declaring `bar' as a one-element array is and always has
been legal. However, allocating extra memory for the struct
and then using `bar' as if it had more than one element is
not. This particular abuse (known as "the struct hack") works
on the great majority of C implementations, but is not actually
legitimate.

How about (untested code):

typedef struct foo_s FooT;
struct foo_s
{
size_t size;
Foo *next;
};

char *fooAlloc(size_t size)
{
FooT *s = malloc(sizeof(FooT) + size);
if (!s)
{
do_error_handling();
}
s->size = size;
s->next = NULL; /* put it in a list or whatever */
return (char*) &s[1];
}

void fooFree(char *p)
{
FooT *s = (FooT*)(p - sizeof(FooT));
/* do whatever to the structure */
free(s);
}

The standard says that malloc() returns an area aligned for any purpose,
so assigning it to a FooT* is valid. By the definition of array access,
s[1] points to a valid address in the allocated memory beyond the
initial structure. Similarly, converting back for the free() is valid
(assuming that you've been passed a valid pointer, but that's true of
any allocation system).
The latest "C99" version of the Standard legitimizes the
struct hack, but introduces a new syntax: you declare `bar'
with no array size at all, as `char bar[]'. However, this is
still relatively new and not yet widely supported by available
compilers.

I don't think it's necessary (and is currently non-portable), since the
pointer to the 'extra' area can always be accessed by taking the address
of the next structure (or converting to char* and adding sizeof the
structure).

Chris C
 
X

xarax

Chris Croughton said:
How about (untested code):

typedef struct foo_s FooT;
struct foo_s
{
size_t size;
Foo *next;

ITYM: FooT * next;
};

char *fooAlloc(size_t size)
{
FooT *s = malloc(sizeof(FooT) + size);
if (!s)
{
do_error_handling();
}
s->size = size;
s->next = NULL; /* put it in a list or whatever */
return (char*) &s[1];
}

void fooFree(char *p)
{
FooT *s = (FooT*)(p - sizeof(FooT));
/* do whatever to the structure */
free(s);
}

The standard says that malloc() returns an area aligned for any purpose,
so assigning it to a FooT* is valid. By the definition of array access,
s[1] points to a valid address in the allocated memory beyond the
initial structure.

This is an incorrect assumption. The &s[1] is not necessarily
aligned to the largest size. The alignment of &s[1] is only
correctly aligned to the alignment of "struct foo_s", which
may be less than the largest alignment.

Suppose the largest legal alignment is 16, the sizeof(size_t)
is 4, and sizeof(FooT*) is 4. The sizeof(struct foo_s) is 8
and its alignment is 4. The &s[1] yields an 8-byte aligned,
but NOT 16-byte aligned pointer. Thus, you get undefined behavior.

Similarly, converting back for the free() is valid
(assuming that you've been passed a valid pointer, but that's true of
any allocation system).
The latest "C99" version of the Standard legitimizes the
struct hack, but introduces a new syntax: you declare `bar'
with no array size at all, as `char bar[]'. However, this is
still relatively new and not yet widely supported by available
compilers.

I don't think it's necessary (and is currently non-portable), since the
pointer to the 'extra' area can always be accessed by taking the address
of the next structure (or converting to char* and adding sizeof the
structure).

Unless special care is taken to ensure that the total structure
alignment matches the largest alignment, you'll get undefined
behavior.
 
E

Eric Sosman

Chris said:
How about (untested code):

typedef struct foo_s FooT;
struct foo_s
{
size_t size;
Foo *next;

ITYM `FooT' here.
};

char *fooAlloc(size_t size)
{
FooT *s = malloc(sizeof(FooT) + size);

`sizeof *s + size' would be slightly better, but
does the same thing.
if (!s)
{
do_error_handling();
}
s->size = size;
s->next = NULL; /* put it in a list or whatever */
return (char*) &s[1];

When I use this trick, I usually write `(char*)(s + 1)',
which means exactly the same thing but (I think) makes my
intention just a tiny bit clearer to the reader. IMHO.
}

void fooFree(char *p)
{
FooT *s = (FooT*)(p - sizeof(FooT));
/* do whatever to the structure */
free(s);
}

The standard says that malloc() returns an area aligned for any purpose,
so assigning it to a FooT* is valid. By the definition of array access,
s[1] points to a valid address in the allocated memory beyond the
initial structure. Similarly, converting back for the free() is valid
(assuming that you've been passed a valid pointer, but that's true of
any allocation system).

Sure -- but there's no array at all in this code,
and it isn't "the struct hack." You're achieving the
same memory layout the hack aims for, but you've lost
the syntactic convenience of being able to refer to
the extra space as if it were a struct element. I use
this technique fairly commonly when there's a string
of some sort associated with the struct, but to reclaim
the syntactic sugar I go ahead and "waste" a `char*'
struct element pointing to the string that immediately
follows the struct itself.

Note that trafficking in the "midpoint pointer" instead
of in the pointer to the `FooT' itself is an orthogonal
matter. It's legal, it works, but it gives up *all* chance
of accessing the other struct members conveniently. I don't
see much point to it.
The latest "C99" version of the Standard legitimizes the
struct hack, but introduces a new syntax: you declare `bar'
with no array size at all, as `char bar[]'. However, this is
still relatively new and not yet widely supported by available
compilers.

I don't think it's necessary (and is currently non-portable), since the
pointer to the 'extra' area can always be accessed by taking the address
of the next structure (or converting to char* and adding sizeof the
structure).

Yes, and if your tolerance for ugliness is high enough you
can even extend the technique to handle "extra" data of a type
with stricter alignment requirements than `char'. C99's "struct
un-hack" lets you achieve the same effect without all the grunge
*and* retains the syntactic convenience of letting you refer to
the "extra" data as part of the struct. I think that's worth
while; code that's easier to read is code that's less likely to
be written wrongly.
 
C

Chris Croughton

ITYM: FooT * next;

Yup. I said it was untested (typing rather than copying).
The standard says that malloc() returns an area aligned for any purpose,
so assigning it to a FooT* is valid. By the definition of array access,
s[1] points to a valid address in the allocated memory beyond the
initial structure.

This is an incorrect assumption. The &s[1] is not necessarily
aligned to the largest size. The alignment of &s[1] is only
correctly aligned to the alignment of "struct foo_s", which
may be less than the largest alignment.

Read what I said. I did not say that it was aligned to the "largest
size". If you look at the code, it only needs to be aligned to a char,
and all other types are at least as big as a char.
Suppose the largest legal alignment is 16, the sizeof(size_t)
is 4, and sizeof(FooT*) is 4. The sizeof(struct foo_s) is 8
and its alignment is 4. The &s[1] yields an 8-byte aligned,
but NOT 16-byte aligned pointer. Thus, you get undefined behavior.

And sizeof(char) is 1 (by definition). It's not undefined at all,
since nothing can have a sizeof less than 1 (although they may all be
equal to 1).
Unless special care is taken to ensure that the total structure
alignment matches the largest alignment, you'll get undefined
behavior.

No, only to the largest alignment needed. In this case, the smallest
alignment will do. But it's easy enough to align to all of the built-in
types, just make a union with all of the largest types (intmax_t, a
pointer, and long double will do), and round up to the next multiple of
that bigger than the size of the structure.

Chris C
 
C

Chris Croughton

ITYM `FooT' here.

Yup. I changed my mind after starting to write it what I was going to
call the type.
`sizeof *s + size' would be slightly better, but
does the same thing.

I don't like the version of sizeof without parentheses, but yes using
the pointer dereferenced is a little more maintainable.
return (char*) &s[1];

When I use this trick, I usually write `(char*)(s + 1)',
which means exactly the same thing but (I think) makes my
intention just a tiny bit clearer to the reader. IMHO.

I've heard people argue the opposite (and worked in places where each
was compulsory by their coding standards). In my own code I am probably
not consistent (I've also seen (char*)s + sizeof(FooT) proclaimed
better).
Sure -- but there's no array at all in this code,
and it isn't "the struct hack." You're achieving the
same memory layout the hack aims for, but you've lost
the syntactic convenience of being able to refer to
the extra space as if it were a struct element. I use
this technique fairly commonly when there's a string
of some sort associated with the struct, but to reclaim
the syntactic sugar I go ahead and "waste" a `char*'
struct element pointing to the string that immediately
follows the struct itself.

Depending on alignment (and wastage in malloc()) it might waste a lot
more than a char*, but that's a fairly 'clean' way of doing it.
Note that trafficking in the "midpoint pointer" instead
of in the pointer to the `FooT' itself is an orthogonal
matter. It's legal, it works, but it gives up *all* chance
of accessing the other struct members conveniently. I don't
see much point to it.

One point is when the function is being used as a malloc() replacement
(to provide error checking, for instance, or easy heap disposal).
Another is so that the details of implementation can be hidden from the
caller (as in a library which needs to be linked with object modules
where the source is not available). Both are "Real World(TM)" cases
(and the latter is outwith the scope of the standard). And indeed
malloc() and friends often do much the same.
Yes, and if your tolerance for ugliness is high enough you
can even extend the technique to handle "extra" data of a type
with stricter alignment requirements than `char'. C99's "struct
un-hack" lets you achieve the same effect without all the grunge
*and* retains the syntactic convenience of letting you refer to
the "extra" data as part of the struct. I think that's worth
while; code that's easier to read is code that's less likely to
be written wrongly.

The non-portability is still a big issue. I agree that it should have
been there from the start (and indeed many compilers even before the C89
standard allowed a zero length to achieve the same goal) but there are
still too many non-C99 compliant compilers around for me to be
comfortable using it. Putting #ifdefs all over the place to cope with
the variant syntax makes the code unreadable and even less maintainable.

Chris C
 
J

jacob navia

RS said:
Hi,

Looking to see if the following construct is valid:
typedef struct {
int foo;
char bar[0];
} foobar;

Basically, the idea is to have the structure above point to a message
buffer that has a 4-byte integer followed by a stream of variable
number of bytes. The structure member "bar" is used to reference the
stream of bytes. The above code compiles fine with a GNU based PPC
cross-compiler running on Solaris and seems to do the function
intended.

My question is: Is it legal to declare an array with zero length? Or
should bar have been declared to be at least one element in length?

Comments appreciated.

RS

The correct declaration in C99 is:

typedef struct {
int foo;
char bar[];
} foobar;

This will work as intended in any C99 compiler

sizeof(foobar) == sizeof(int);

allocate with
foobar *foo = malloc(sizeof(foobar)+25);
to allocate a "foobar" followed by 25 characters.

jacob
 
C

Chris Torek

... But it's easy enough to align to all of the built-in
types, just make a union with all of the largest types (intmax_t, a
pointer, and long double will do), and round up to the next multiple of
that bigger than the size of the structure.

This is not *guaranteed*, because implementations are allowed to
be arbitrarily perverse: one could, for instance, make "short *"
and "int *" use four-byte alignment, but make "short **" use
1024-byte alignment, for no good reason.

Realistically, intmax_t from C99 helps a lot, but just those three
-- intmax_t, "a" pointer, and long double -- is probably not
sufficient for existing implementations. You need at least two
pointers, one to data and one to code:

union align {
intmax_t i;
long double ld;
long double *ldp;
void (*fp)(void);
};

is pretty likely to work. (Not guaranteed, just "pretty likely".)
 
E

Eric Sosman

jacob said:
RS said:
Hi,

Looking to see if the following construct is valid:
typedef struct {
int foo;
char bar[0];
} foobar;

Basically, the idea is to have the structure above point to a message
buffer that has a 4-byte integer followed by a stream of variable
number of bytes. The structure member "bar" is used to reference the
stream of bytes. The above code compiles fine with a GNU based PPC
cross-compiler running on Solaris and seems to do the function
intended.

My question is: Is it legal to declare an array with zero length? Or
should bar have been declared to be at least one element in length?

Comments appreciated.

RS


The correct declaration in C99 is:

typedef struct {
int foo;
char bar[];
} foobar;

This will work as intended in any C99 compiler

sizeof(foobar) == sizeof(int);

No; sizeof(foobar) >= sizeof(int) is all you can
be sure of. The compiler may insert padding between
the `foo' and `bar' members, even though there seems
no obvious reason to do so in this case.
allocate with
foobar *foo = malloc(sizeof(foobar)+25);
to allocate a "foobar" followed by 25 characters.

malloc(sizeof *foo + 25) would be better.
 
B

Ben Pfaff

Chris Torek said:
Realistically, intmax_t from C99 helps a lot, but just those three
-- intmax_t, "a" pointer, and long double -- is probably not
sufficient for existing implementations. You need at least two
pointers, one to data and one to code:

union align {
intmax_t i;
long double ld;
long double *ldp;
void (*fp)(void);
};

is pretty likely to work. (Not guaranteed, just "pretty likely".)

I'm not sure why you chose a pointer to long double over a
pointer to void (or to char). If long doubles need 8-byte
alignment, then a pointer to long double could in theory be
shorter than a pointer to void.
 
C

Chris Croughton

This is not *guaranteed*, because implementations are allowed to
be arbitrarily perverse: one could, for instance, make "short *"
and "int *" use four-byte alignment, but make "short **" use
1024-byte alignment, for no good reason.

Not realistically, because any alignment must be no larger than the size
of the object (otherwise arrays of the object wouldn't work). So if I
had:

short **arr[1024];

and the alignment for a short** was 1024 that would allocate a megabyte
and lots of people would be very unhappy.

I agree about having a function pointer in there as well, that can
easily be a different size from a data pointer.

In practice, though, the most you need is to have one of every basic
type that you want to put into the storage. In one of my applications,
for instance, I use no floating point variables or function pointers so
I'm safe leaving out the long double and the function pointer (and I
create my own IntMax type because I can't guarantee the presence of
stdint.h, or that it will support intmax_t even if it is present).
Since it's likely to be in a separate module (at least, it ought to be)
that module can do all sorts of things involving #ifdefs testing macros
from limits.h (and stdint.h if that is available) and including lots of
types in the union.

(And now I'm singing "You can't get me, I'm part of the union..." It
must be bedtime...)

Chris C
 
M

Michael Mair

Eric said:
jacob said:
RS wrote:

Hi,

Looking to see if the following construct is valid:
typedef struct {
int foo;
char bar[0];
} foobar;

Basically, the idea is to have the structure above point to a message
buffer that has a 4-byte integer followed by a stream of variable
number of bytes. The structure member "bar" is used to reference the
stream of bytes. The above code compiles fine with a GNU based PPC
cross-compiler running on Solaris and seems to do the function
intended.

My question is: Is it legal to declare an array with zero length? Or
should bar have been declared to be at least one element in length?

Comments appreciated.

RS

The correct declaration in C99 is:

typedef struct {
int foo;
char bar[];
} foobar;

This will work as intended in any C99 compiler

sizeof(foobar) == sizeof(int);

No; sizeof(foobar) >= sizeof(int) is all you can
be sure of. The compiler may insert padding between
the `foo' and `bar' members, even though there seems
no obvious reason to do so in this case.

Yep. In fact,
sizeof(foobar)==offsetof(foobar,bar)==offsetof(baz,bar)
for
typedef struct {
int foo;
char bar[1];
} baz;
The last equality is the point which gives the gcc people
such a headache -- they are aligning their equivalent to
flexible array members with more padding (in some cases).
In order to not break existing code or get lost in the
slightly different semantics of bar[0]; vs bar[];, they
decided to sit it out... Apart from the latter innuendo,
more information about this can be found following the
discussion link toward the bottom of
gcc.gnu.org/c99status.html


Cheers
Michael
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top