On VLAs and incomplete types

S

Sensei

Hi! I am still learning a lot reading thin NG, so I am now turning
here to clarify my doubts with VLAs. I hope I won't be that silly or
naive :)

In (6.7.5.2) I read about the correct declaration of arrays, although I
don't understand how a variable is handled by the compiler in declaring
a VLA. As I understand, an array must be declared with an integer size
specifier:

int x[3];

That would give a complete type, and I can use x happily. If no integer
is specified, then

int x[];

would be an incomplete type. This quite puzzles me, since I don't
understand how x[] could then be used in a code block. I can only think
about a function with two parameters, the incomplete array and the
size. Is it possible to make use of that variable in other ways?

The other doubt is about VLAs. If some variable is specified instead of
a constant, then the variable should be in the block. In the document I
see a prototype and a variable inside the code block:

void fvla(int m, int C[m][m])
....
int D[m];


My question is probably stupid, but how are C and D handled by the
compiler? My concerns are about some code like the following (mixed
code and variables, since it is C99):

/* checks about argc and argv */

int a = atoi(argv[1]);

/* follows some check about a */

int x[a];



I know it's risky, but it's allowed, isn't it? Is x allocated on the
fly as we enter the code block (somehow, on the heap or by any other
non C-related means) and then subsequently freed when exit? Then what
are the advantages (not counting the "syntactic sugar" I can think of)
of having VLAs into the standard?

I am puzzled, but forgive me: I am still learning a lot! :)

Thanks!
 
B

Ben Bacarisse

Sensei said:
In (6.7.5.2) I read about the correct declaration of arrays, although
I don't understand how a variable is handled by the compiler in
declaring a VLA. As I understand, an array must be declared with an
integer size specifier:

int x[3];

That would give a complete type, and I can use x happily. If no
integer is specified, then

int x[];

would be an incomplete type. This quite puzzles me, since I don't
understand how x[] could then be used in a code block.

It can't be.
I can only
think about a function with two parameters, the incomplete array and
the size. Is it possible to make use of that variable in other ways?

No, but I think you are being confused by one the things that most
frequently confuses people new to C. The [] syntax in a parameter
specification has a special meaning. It does not denote an incomplete
type -- it is essentially the same as declaring that parameter to be a
pointer. In other words:

void f(int x[]);

and

void f(int *x);

are the same. This only applies to the "first" set of []s. Thus

void f(int x[][])

*is* illegal because it declares x as having an incomplete element type.

None of this applies to array objects. Trying to define an array x,
like this:

int x[];

as an automatic variable (i.e. in the body of a function) is wrong.
You *can* use empty []s if an initializer is used to give the size:

int x[] = {1, 2, 3};

is OK. There are some special cases when [] is used in an array
declaration at file scope (outside any function body) but it is best
to put these to one side for the moment.
The other doubt is about VLAs. If some variable is specified instead
of a constant, then the variable should be in the block. In the
document I see a prototype and a variable inside the code block:

void fvla(int m, int C[m][m])
...
int D[m];


My question is probably stupid, but how are C and D handled by the
compiler?

This is a common question. How does this work? What is the compiler
doing here? The trouble with this sort of question is that you are
asking for more than you need. Why add understanding the C compiler
to the task of understanding C? Sometimes it helps but with more
complex languages is definitely does not. (For example, it is much
easier to understand what a Haskell program means, than it is to see
how the compiler make it work.)

Any, I'll have a go... In this program:

void f(int m, int C[m][m])
{
int D[m];
...
}

int main(void)
{
int x[4][4];
f(4, x);
return 0;
}

x is simple a 2D array of 4 arrays or 4 ints. It is passed to f (like
all arrays are passed) as a pointer to its first element. x is
converted to a pointer of type 'int (*)[4]'. f would be the same if
it were declared:

void f(int m, int C[][m]);

or

void f(int m, int (*C)[m]);

Only the second m matters to the compiler. This is because, given a
pointer to that first array of 4 ints, all the compiler needs to know
is how to get to the next element. It needs to know the element size,
not how many elements there are.

When the program flow reaches the declaration of D, space for an array
m (in this case 4) ints is allocated. This storage lasts until the
function returns, so it is natural for the compiler to take the
storage from the same place it uses for other automatic variables --
usually this is from a stack.
My concerns are about some code like the following (mixed
code and variables, since it is C99):

/* checks about argc and argv */

int a = atoi(argv[1]);

/* follows some check about a */

int x[a];


I know it's risky, but it's allowed, isn't it?
Yes.

Is x allocated on the
fly as we enter the code block (somehow, on the heap or by any other
non C-related means)

Well put. How or where does not matter. It is likely to allocated
from a stack ("the" stack if the compiler is using only one).
and then subsequently freed when exit?

The way I used to put it was: the storage lasts until the program flow
passes the end of the variable's scope (unless the program exits before
that). It is surprisingly hard to get the words exactly right, but
most people find the intent more natural than the wording used to
explain it!
Then what
are the advantages (not counting the "syntactic sugar" I can think of)
of having VLAs into the standard?

It allows you to have arrays of variable size, efficiently allocated.
malloc and free will usually be slower and you need to do the freeing.
The real advantage, though, comes with code like your 2D array
parameter called C. Such things are simpler with VLA parameters.
 
S

Sensei

Well put. How or where does not matter. It is likely to allocated
from a stack ("the" stack if the compiler is using only one).


The way I used to put it was: the storage lasts until the program flow
passes the end of the variable's scope (unless the program exits before
that). It is surprisingly hard to get the words exactly right, but
most people find the intent more natural than the wording used to
explain it!


It allows you to have arrays of variable size, efficiently allocated.
malloc and free will usually be slower and you need to do the freeing.
The real advantage, though, comes with code like your 2D array
parameter called C. Such things are simpler with VLA parameters.


Well, I think you clarified a lot. Thanks!

One last question, you are saying that VLAs are "usually" faster than
the malloc/free counterpart, although arrays are passed by means of
pointers. How can it be so, if the array lives as long as the code
block it refers to is executing? I mean, the code will effectively
allocate and deallocate memory, so there must be an explanation for
being faster. Is heap and stack actually different in this?

As for 2D arrays, I'm using a simple array with appropriate indexing,
is there a performance reason for preferring a variable length array?

Thanks for bearing with me :)

--

Sensei <Sensei's e-mail is at Mac-dot-com>

Basic research is what I am doing when I don't know what I am doing.
(Wernher von Braun)
 
U

user923005

Well, I think you clarified a lot. Thanks!

One last question, you are saying that VLAs are "usually" faster than
the malloc/free counterpart, although arrays are passed by means of
pointers. How can it be so, if the array lives as long as the code
block it refers to is executing? I mean, the code will effectively
allocate and deallocate memory, so there must be an explanation for
being faster. Is heap and stack actually different in this?

As for 2D arrays, I'm using a simple array with appropriate indexing,
is there a performance reason for preferring a variable length array?

Automatic variables are typically placed on a *cough* stack. The
generation of these variables is really just a simple subtraction.
The generation of dynamic memory is more complicated and requires
functions to track the allocations carefully.
The speed of access will be similar between automatic and dynamically
allocated memory. It is the book keeping of generation and disposal
that differ significantly.

So why not just use automatic memory all the time? It is a very
limited resource, and when it fails, it does not fail gracefully like
malloc() which will tell you that something went wrong by returning an
NULL pointer. If an automatic allocation fails you can expect a core
dump or some sort of undefined behavior.
 
S

santosh

Sensei said:
Well, I think you clarified a lot. Thanks!

One last question, you are saying that VLAs are "usually" faster than
the malloc/free counterpart, although arrays are passed by means of
pointers. How can it be so, if the array lives as long as the code
block it refers to is executing? I mean, the code will effectively
allocate and deallocate memory, so there must be an explanation for
being faster. Is heap and stack actually different in this?

He is talking about the speed of allocation. VLAs are allocated on most
implementations on a special area of storage called the stack. In most
cases, the machine provides instructions to efficiently manage this
kind of storage. This turns out to be *much* faster than allocation
through *alloc() which usually allocated storage from the so
called "heap", which may involve relatively time consuming negotiations
with the operating systems.
As for 2D arrays, I'm using a simple array with appropriate indexing,
is there a performance reason for preferring a variable length array?

Well, VLAs are usually used when you do not know the size of the array
at compile time. They are, as I said above, usually much more efficient
than allocating through malloc, and have the added advantage of being
managed by the compiler. You need not (and cannot) explicitly free
them.

If you do know the array size at compile time the static arrays may
bring you some benefits in terms of increased portability.

alloca() is another non-standard alternative to both the above discussed
mechanism. We had a long and detailed thread about it a week or two
ago. You'll find it in Google Groups's archive.
 
S

Sensei

Automatic variables are typically placed on a *cough* stack. The
generation of these variables is really just a simple subtraction.
The generation of dynamic memory is more complicated and requires
functions to track the allocations carefully.
The speed of access will be similar between automatic and dynamically
allocated memory. It is the book keeping of generation and disposal
that differ significantly.

So why not just use automatic memory all the time? It is a very
limited resource, and when it fails, it does not fail gracefully like
malloc() which will tell you that something went wrong by returning an
NULL pointer. If an automatic allocation fails you can expect a core
dump or some sort of undefined behavior.


Ok, so I understand why VLAs are usually faster, and I understand also
that sizes of VLAs may be limited compared to malloc'ed arrays, since
the memory VLAs are allocated into is quite precious and limited.

Thanks for helping me clarifying my (perhaps) silly doubts!

--

Sensei <Sensei's e-mail is at Mac-dot-com>

There is no reason for any individual to have a computer in his home.
(Ken Olsen, President, Digital Equipment, 1977)
 
S

Sensei

He is talking about the speed of allocation. VLAs are allocated on most
implementations on a special area of storage called the stack. In most
cases, the machine provides instructions to efficiently manage this
kind of storage. This turns out to be *much* faster than allocation
through *alloc() which usually allocated storage from the so
called "heap", which may involve relatively time consuming negotiations
with the operating systems.


Well, VLAs are usually used when you do not know the size of the array
at compile time. They are, as I said above, usually much more efficient
than allocating through malloc, and have the added advantage of being
managed by the compiler. You need not (and cannot) explicitly free
them.

If you do know the array size at compile time the static arrays may
bring you some benefits in terms of increased portability.

alloca() is another non-standard alternative to both the above discussed
mechanism. We had a long and detailed thread about it a week or two
ago. You'll find it in Google Groups's archive.


Thanks for the reply, it's always a good thing learning more about
these matters! :)

--

Sensei <Sensei's e-mail is at Mac-dot-com>

Basic research is what I am doing when I don't know what I am doing.
(Wernher von Braun)
 
S

santosh

Sensei said:
Ok, so I understand why VLAs are usually faster, and I understand also
that sizes of VLAs may be limited compared to malloc'ed arrays, since
the memory VLAs are allocated into is quite precious and limited.

On modern virtual memory OSes like UNIX systems or Windows, the stack
can have a theoretical maximum size of 4 Gb, but it's usually limited
to under 16 Mb by the OS.

Since it's mapped onto main memory the same way the heap is, it is only
as precious and limited as the OS and the system's physical memory
constrain it to be.
 
O

Old Wolf

Sensei said:
would be an incomplete type. This quite puzzles me, since I don't
understand how x[] could then be used in a code block.

It can't be.

In fact it can, here is an example. Arrays of
incomplete type can still decay to pointers.
In fact it's not uncommon to have 'extern int y[];'
where 'y' has incomplete type but is then used.

#include <stdio.h>

int x[];

int main()
{
for (int i = 0; i != 5; ++i)
printf("%d\n", x);
}

int x[5] = { 3, 6, 9, 12, 15 };
 
B

Ben Bacarisse

Old Wolf said:
Sensei said:
would be an incomplete type. This quite puzzles me, since I don't
understand how x[] could then be used in a code block.

It can't be.

In fact it can,

Yes, and I went on to say this (albeit without an example). It seems
rather unfair to clip that and claim I missed something.

It will become impossible to write explanations for people new to C if
every statement must stand up when taken out of the context of the
reply. If you wanted to explain this point to the OP, then you could
have replied to their "I don't understand how x[] could then be used
in a code block" (with a note to say you are taking "code block" to
mean file-scope declarations).
 
O

Old Wolf

Old Wolf said:
Sensei <Sensei's e-mail is at Mac-dot-com> writes:
int x[];
would be an incomplete type. This quite puzzles me, since I don't
understand how x[] could then be used in a code block.
It can't be.
In fact it can,

Yes, and I went on to say this (albeit without an example).

You contradicted yourself, then. I don't think there is any
grounds to make an absolute statement "It can't be" (which
you made an entire paragraph on its own), when in fact it
can be, and it is not uncommon to do so in real code.

You mention the example of int x[]; but didn't mention at
all the much more common:
extern int x[];
It seems
rather unfair to clip that and claim I missed something.

I'm sure you didn't miss it personally, but I think it
is confusing to say the very least to say "Y is not possible."
and go on quoting more text from the original, and then later
down the page bury a sentence "except for..."
It will become impossible to write explanations for people new to C if
every statement must stand up when taken out of the context of the
reply. If you wanted to explain this point to the OP, then you could
have replied to their "I don't understand how x[] could then be used
in a code block" (with a note to say you are taking "code block" to
mean file-scope declarations).

I'm taking "code block" to mean stuff at block scope. The case
I am highlighting is the one where x is used at block scope,
as the OP said, but declared at file scope.
 
U

user923005

Ok, so I understand why VLAs are usually faster, and I understand also
that sizes of VLAs may be limited compared to malloc'ed arrays, since
the memory VLAs are allocated into is quite precious and limited.

This is typically true. Many systems have a fixed size for automatic
variables (and often you can change the amount by a linker switch or
program that modifies the executable after linking).
With allocated memory, you will sometimes get the amount of virtual
memory that is possible on the machine, and sometimes you will get the
amount allowed by a user limit of some kind.

At any rate, you have perceived correctly that the (typically) faster
allocation of automatic memory comes at a price of danger.
Because C allows recursion, it is hard to know how big automatic
memory can become except heuristically by measurement. The same goes
for allocated memory, of course, but often there is more allocated
memory available to a process and it is always possible to catch a
problem with calloc() or malloc() because NULL will be returned if the
memory is not available.

These sorts of details are more or less common practice but do not
necessarily predict how an arbitrary C implementation might behave.
 
S

Sensei

This is typically true. Many systems have a fixed size for automatic
variables (and often you can change the amount by a linker switch or
program that modifies the executable after linking).
With allocated memory, you will sometimes get the amount of virtual
memory that is possible on the machine, and sometimes you will get the
amount allowed by a user limit of some kind.

At any rate, you have perceived correctly that the (typically) faster
allocation of automatic memory comes at a price of danger.
Because C allows recursion, it is hard to know how big automatic
memory can become except heuristically by measurement. The same goes
for allocated memory, of course, but often there is more allocated
memory available to a process and it is always possible to catch a
problem with calloc() or malloc() because NULL will be returned if the
memory is not available.

These sorts of details are more or less common practice but do not
necessarily predict how an arbitrary C implementation might behave.

Thanks! It's always a good thing to know more about common behaviors
and good practices.

--

Sensei <Sensei's e-mail is at Mac-dot-com>

We know Linux is the best, it can do infinite loops in five seconds.
(Linus Torvalds)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,990
Messages
2,570,211
Members
46,796
Latest member
SteveBreed

Latest Threads

Top