Stack usage

  • Thread starter Michael Sig Birkmose
  • Start date
D

Dan Pop

In said:
And hope nobody passes in a value too big to be used to size a VLA of a
given type. How big is too big? Only your debugger (or maintenance
programmer) knows for sure. Until then, you have no way of checking that
your next run won't cause all sorts of crap to happen.

At least malloc() gives me a fighting chance at handling errors in memory
allocation. You can quote at me all the systems which will over-commit to
Hell and back so malloc() can never fail*, but all things equal, I'd
rather have /something/ to test instead of being forced to hope nothing
blows up.

VLAs are the new gets().

I fail to see how are VLAs special in this context, as opposed to
automatically allocated fixed sized arrays. It is the automatic
allocation that is causing the problem, not the fixed vs variable nature
of the array size.

Dan
 
A

August Derleth

I fail to see how are VLAs special in this context, as opposed to
automatically allocated fixed sized arrays. It is the automatic
allocation that is causing the problem, not the fixed vs variable nature
of the array size.

With fixed-length arrays, the compiler could in theory warn you about
making things too big for the current compilation mode. For example, a C
compiler for an x86 DOS machine could warn about an array that would push
the .COM file's size above the magic 64K limit. I don't know if such
warnings are even considered by the Standard, but it would be a very
definite QoI issue.

With VLAs, however, array size becomes a matter of what exactly happens at
runtime. As runtime is obviously affected by input and other external
events, the compiler is of no help. If someone manages to finesse the
array into being excessively large (by passing in a very long string,
knowing the VLA will be sized by a strlen() call, for example), he could
make your program do anything at all. Since VLAs obviously cannot return
error codes, your program absolutely cannot handle this dangerous event.

I'm willing to be proven wrong. I always am. I hope the Standards
Committee wasn't as stupid as the current state of VLAs implies they were.
But as long as code like this is valid:

#include <string.h>

char *danger(char *str1, char *str2)
{
char badarr[strlen(str1) + strlen(str2)]; return badarr;
}

I will be somewhat leery of the whole concept. After all, they kept
gets().
 
K

Keith Thompson

August Derleth said:
But as long as code like this is valid:

#include <string.h>

char *danger(char *str1, char *str2)
{
char badarr[strlen(str1) + strlen(str2)]; return badarr;
}

I will be somewhat leery of the whole concept. After all, they kept
gets().

Well, I suppose that's valid, but the caller will invoke undefined
behavior if it refers to the returned pointer value. But that's
not because it uses VLAs; the following exhibits the same problem:

char *danger2(void)
{
char badarr[42];
return badarr;
}

(It might be possible to modify the language so an attempt to return a
pointer to local storage is illegal, rather than causing later
undefined behavior, but I don't know how.)

I don't think that's the point you were trying to make, though.
 
R

Richard Bos

August Derleth said:
If someone manages to finesse the
array into being excessively large (by passing in a very long string,
knowing the VLA will be sized by a strlen() call, for example), he could
make your program do anything at all.

With correctly coded VLAs, this is as impossible as with correctly coded
non-VL arrays. With brokenly coded VLAs, it is as possible as with
broken normal arrays.
I'm willing to be proven wrong. I always am. I hope the Standards
Committee wasn't as stupid as the current state of VLAs implies they were.
But as long as code like this is valid:

#include <string.h>

char *danger(char *str1, char *str2)
{
char badarr[strlen(str1) + strlen(str2)]; return badarr;
}

I will be somewhat leery of the whole concept.

Yes, if you write code like that, I would stay far, far away from VLAs.
However, I'd also suggest you stay far away from allocated memory and
from normal arrays, since your (highly probably) too short array size is
just as much of a bug with malloc() and friends, and returning the
address of a local array is a bug regardless of whether it has variable
length or not.

Richard
 
D

Dan Pop

In said:
With fixed-length arrays, the compiler could in theory warn you about
making things too big for the current compilation mode.

How could the compiler know, even in theory, the amount of space available
for automatically allocated objects?
For example, a C
compiler for an x86 DOS machine could warn about an array that would push
the .COM file's size above the magic 64K limit. I don't know if such
warnings are even considered by the Standard, but it would be a very
definite QoI issue.

What if foo() in foo.c calls bar() in bar.c and each of them allocate an
automatic of 40k? What if everything fits into 64K, but there are only
40k free when you try to execute the program?
With VLAs, however, array size becomes a matter of what exactly happens at
runtime.

What exactly happens at run time is also an issue with fixed size arrays,
as shown above.
I'm willing to be proven wrong. I always am. I hope the Standards
Committee wasn't as stupid as the current state of VLAs implies they were.

As I have already explained, it's the automatic nature of the allocation
that is at the root of the problem, not the variable size of the arrays.
But as long as code like this is valid:

#include <string.h>

char *danger(char *str1, char *str2)
{
char badarr[strlen(str1) + strlen(str2)]; return badarr;
}

I will be somewhat leery of the whole concept.

This is valid *only* if used as a void function. The value of the
appropriately named badarr becomes indeterminate as soon as the function
returns. Now, imagine that strlen(str1) and strlen(str2) are known, e.g.
as macros. Why would

char badarr[STR1_LEN + STR2_LEN];

be any better? How can the compiler decide, at translation time, that
there will be enough "stack" space for badarr at run time?

But, if you think that automatic allocation is a language misfeature,
keep its usage limited at scalars (still no guarantee that your code
will not crash with a stack overflow ;-) If you want to eliminate
automatic allocation from the language, this automatically implies
banning recursion and this is not going to make your proposal particularly
popular ;-)
After all, they kept gets().

Which is a good thing (sometimes it is exactly what you need).
The bad thing is that they didn't deprecate fgets, which is *never*
what you need.

Dan
 
C

CBFalconer

Dan said:
.... snip ...

How could the compiler know, even in theory, the amount of space
available for automatically allocated objects?

By injecting a call, at function entry, to a system routine which
can compare the amount needed (a parameter) with the space
available. Many systems do this - it is usually called runtime
stack checking. At compile time it is only possible to detect
gross errors, such as attempting to assign more space than a stack
segment can hold.
 
D

Dan Pop

In said:
By injecting a call, at function entry, to a system routine which
can compare the amount needed (a parameter) with the space
available. Many systems do this - it is usually called runtime
stack checking.

Had you engaged your brain, you'd have realised that nobody was talking
about run time checks: they happen too late to be of any use. And they
work for VLAs just the same way they work for ordinary arrays, because
they are performed at run time.
At compile time it is only possible to detect
gross errors, such as attempting to assign more space than a stack
segment can hold.

Which are useless in most cases and quite often downright impossible,
as the size of the stack segment need not be the same at compile time
and at run time (on Unix systems, it is under user control, up to a
certain limit imposed by the kernel).

Dan
 
D

Dave Thompson

I think there's a difference of definition here. I believe Malcolm
is referring not to a map of possible program paths, but a static
call graph. <snip>
Since C does not allow the dynamic construction of function calls,
it's possible to construct such a graph by static analysis of the
source. Whether a given edge will actually be followed in any run of
the program is a separate question (and is equivalent to the Halting
Problem).
But C does allow dynamic selection of the callee by use of function
pointers. In general dataflow analysis to determine which function(s)
could be the value of a pointer at a given call site is the Halting
Problem; and even in practical cases it is often nontrivial. You can
limit it to targets whose (actual) signature is compatible with the
pointer used to call -- or for an "unspecified" K&R1-type pointer,
with the return and default-promoted arguments -- modulo a few special
cases, but that may not help very much.
For a C program which is non-recursive, the call graph will be
acyclic. If nodes are weighted by their automatic storage
requirement, then the maximum weighted depth of the graph can be
computed.

However, there's no mechanism in standard C for computing the
automatic storage requirement of a function; many interesting C
programs are recursive; and in many cases not all of the source is
available for analysis. And even for programs which satisfy the

The first and third can be solved by looking at object code; this is
implementation-dependent but that's "only" a constant factor harder,
which is completely ignored in complexity theory.
latter two conditions and an implementation which supplies sufficient
information for the first, all you know is the upper bound on
automatic storage, which is unlikely to be particularly useful in
most cases. So the prospects for exact analysis of automatic storage
requirements are not good.
Agree.

As a side note, the non-recursive constraint can be avoided. Any
recursive program can be algorithmically transformed <snip> in effect,
implementing a CPS-capable language on top of C. Hardly worth the
effort. And all that does is transform auto-storage requirement into
direct-allocated storage requirement anyway, which is a zero-sum or
worse tradeoff in most environments.

I'm not sure I'd go for "most" there; if done carefully, particularly
with customized allocation rather than raw malloc, I'd expect a slight
gain significantly often, though by no means always. But I would say
*not worth the hassle* in most cases.

- David.Thompson1 at worldnet.att.net
 
M

Michael Wojcik

But C does allow dynamic selection of the callee by use of function
pointers.

Yes, but I don't think this is a problem. It's dynamic selection
from a fixed set of targets - the union of the functions declared by
the program (and in scope for a given pointer assignment, though this
can be ignored for the sake of simplicity), and the functions defined
by the implementation.

If static analysis can determine:

1. the set of all possible targets of function pointers, ie the set
described above, and

2. whether a function pointer is dereferenced in a given function

then it can add a node to the graph for each possible target (step 1)
and an edge to each such node, from every node where a function
pointer is dereferenced (2). This may produce some redundant nodes
(since some functions may be called both normally and through
pointers), and will almost certainly produce nodes that are never
actually reached in any execution of the program. It will also
almost certainly produce many edges that are never actually taken
(and cannot be taken, because no path through the program would
assign a function pointer a value such that the edge *would* be
taken).

However, the point of this mooted static analysis is to determine the
maximum possible depth of the call graph - it's pessimistic. So
having extra nodes and edges doesn't violate the specification.

In short (I know, too late), it's not necessary for this analysis to
determine whether function pointer A will point to function X in any
actual execution of the program; it's only necessary to assume that
it might, and add the corresponding edge to the graph.
In general dataflow analysis to determine which function(s)
could be the value of a pointer at a given call site is the Halting
Problem; and even in practical cases it is often nontrivial.

Agreed, and if this was meant to be a useful analysis of the call
graph that would be a problem. But for determining maximum possible
depth in this pessimistic, imprecise way, it isn't necessarily one.
Of course, the presence of function pointers, combined with this
pessimistic assumption that a function pointer can point to any
function, is likely to produce a cycle in the graph, in which case
this procedure determines that the maximum call depth is infinite -
not a useful answer anyway.

But I'd agree that *useful* analysis of calls through function
pointers is equivalent to the Halting Problem in the general case.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,143
Messages
2,570,822
Members
47,368
Latest member
michaelsmithh

Latest Threads

Top