Is allocating large objects on the stack a good practice?

M

Michael Tsang

For example, if we allocate objects like this:

void foo(void) {
static const size_t size = 10000000; /* ten million */
int data[size]; /* data is a very large object */
/* do something with data */
}

In C99, perhaps this:

#include <stdio.h>
int main(void) {
size_t n; /* This program makes sense even n is very large */
scanf("%zu", &n); /* n must not be negative */
int data[n]; /* VLA: we need not care freeing the
object */
/* do something with data */
}

Or, in C++:

template<size_t n> void func_templ() {
int data[n];
/* do something with data */
if(n) func_templ<n - 1>() /* recursive template
instantiation */
}

Will it cause any trouble. I know that, writing code that does not leak is
VERY difficult in C. (In C++, the objects allocates memory only in the
constructor and deallocates memory only in the destructor so the problem is
not so serious). Stack allocations (especially C99 VLAs) free the
programmers ever from mallocking memory manually so it seems to be a good
idea. Also, I like setting the stack size to unlimited to prevent stack
overflow ever from happening. Is all these a good idea?
 
D

Dr Malcolm McLean

Will it cause any trouble. I know that, writing code that does not leak is
VERY difficult in C. (In C++, the objects allocates memory only in the
constructor and deallocates memory only in the destructor so the problem is
not so serious). Stack allocations (especially C99 VLAs) free the
programmers ever from mallocking memory manually so it seems to be a good
idea. Also, I like setting the stack size to unlimited to prevent stack
overflow ever from happening. Is all these a good idea?
It's not VERY difficult to write code in C that doesn't leak. However
you need a certain discipline - either free memory in the same
function that youa llocated it, or provide matching construct/kill
functions that return a pointer to a dynamically-allocated structure
and free it.
In addition you can use tools to detect memory leaks.

Allocating everything on the stack is no solution. The stack is
designed for relatively small amounts of scratchspace memory. If you
use it for huge data structures you may make code hard for the
compiler to optimise, and you may run out of stack space. Just as
significantly, whilst the pattern of last allocated / first freed is
useful, it cannot be applied to all the data structures ypu might
need. Consider a linked list that has to expand and contract as the
user adds and delates items form his workspace.
The better solution is automatic garbage collection. However this has
lots of issues of its own.
 
E

Ersek, Laszlo

For example, if we allocate objects like this:

void foo(void) {
static const size_t size = 10000000; /* ten million */
int data[size]; /* data is a very large object */
/* do something with data */
}

This is a bad idea for my taste.

I know that, writing code that does not leak is
VERY difficult in C.

Two ideas to cope:

1) object construction:

/*
Resource type whose initialization entails dynamic memory allocation
for and initialization of sub-resources.
*/
struct res
{
struct sub_res1 *res1;
struct sub_res2 *res2;
struct sub_res3 *res3;
};


/*
Initialize a resource object that has already been allocated.
Return -1 for failure and 0 for success. If -1 is returned, the
contents of the object is indeterminate.
*/
int
res_init(struct res *res, int p1, int p2, int p3)
{
res->res1 = res1_construct(p1);

if (0 != res->res1) {
res->res2 = res2_construct(p2);

if (0 != res->res2) {
res->res3 = res3_construct(p3);

if (0 != res->res3) {
/* object complete */

return 0;
}

res2_destruct(res->res2);
}

res1_destruct(res->res1);
}

return -1;
}

/* Uninitialize a successfully initialized object. */
void
res_uninit(struct res *res)
{
res3_destruct(res->res3);
res2_destruct(res->res2);
res1_destruct(res->res1);
}


/* Allocate and initialize an object. */
struct res *
res_construct(int p1, int p2, int p3)
{
struct res *res;

res = malloc(sizeof *res);
if (0 != res) {
if (0 == res_init(res, p1, p2, p3) {
return res;
}

free(res):
}
return 0;
}

/* Uninitialize and free an allocated and initialized object. */
void
res_destruct(struct res *res)
{
res_uninit(res);
free(res);
}


This goes on recursively, that is, the res1_construct(),
res2_construct() and res3_construct() functions called in res_init() all
have the same structure as res_construct() itself. Additionally,
res_construct() can participate in an even-higher level _init() routine.

The _init() functions allow the programmer to define an "outermost"
object with automatic or static storage duration, or to initialize a
structure member object of an already allocated structure. (What is
"outermost" depends on the programmer's situation, so it's useful to
declare all _init() and _uninit() functions separately from _construct()
and _destruct(), and with external linkage.)

The _init() functions can allocate all kinds of system resources, not
just memory (eg. file descriptors to all kinds of files.)

The whole thing mimics the constructor/destructor stuff of C++.


2) Temporary object construction for computation: this is almost
identical to the _init() functions, except that the innermost
success-return is replaced with storing the result and a success
indicator, and with releasing the innermost resource. The unwinding
happens unconditionally here:


/*
Compute some result based on p1, p2, p3. If the computation was
successful, 0 is returned and the result is stored in *result.
Otherwise, -1 is returned.
*/
int
compute(struct result_type *result, int p1, int p2, int p3)
{
int ret;
struct sub_res1 tmp1;

ret = -1;
if (-1 != res1_init(&tmp1, p1)) {
struct sub_res2 tmp2;

if (-1 != res2_init(&tmp2, p2)) {
struct sub_res3 tmp3;

if (-1 != res3_init(&tmp3, p3)) {
/* All objects present, do computation. */

*result = ...;
ret = 0;

res3_uninit(&tmp3);
}

res2_uninit(&tmp2);
}

res1_uninit(&tmp1);
}

return ret;
}


Ideas 1 and 2 can be recursively combined, too; for example, some
computation may be necessary to initialize an object, and the compute()
function above already relies on idea 1 (object initialization). No such
call-tree leaks (unless I botched up the code above, but you get the
idea).

Note that the _uninit() and _destruct() functions described above are
unable to signal errors. If such an _uninit() calls eg. fclose(), that's
lossy, because fclose() might try to flush output and it could fail.
This is only relevant in the compute() case, not the object construction
case, because in the latter case, we will signal an error anyway back to
the caller if we're on the error path.

I can name two solutions to this:

a) Make the _uninit() functions return a success/error value too, and
when walking towards the exit in compute(), have any failed _uninit()
reset "ret" to -1 and destroy (release) *result.

b) Make fflush() part of the computation, or more generally, make sure
that once we set "ret = 0", nothing can go wrong within reason.

.... I cheated a little, because even compute() is a sort of object
initialization -- that of "*result".

These "patterns" cannot be used indiscriminately. The idea to take away
is the staircase-like embedding of "if" statements. (Many people hate it
with a passion, because it introduces a lot of basic blocks and
increases "cyclomatic complexity". IMHO with a reasonable resolution
(and consequently, depth) of _init() / compute() functions, things stay
manageable. One benefit of this approach appears to be that you never
have to write O(n^2) pieces pf _uninit() calls in error handling
sections.)

Or something like that.

lacos
 
I

ImpalerCore

For example, if we allocate objects like this:

void foo(void) {
        static const size_t size = 10000000; /* ten million */
        int data[size]; /* data is a very large object */
        /* do something with data */

}

In C99, perhaps this:

#include <stdio.h>
int main(void) {
        size_t n;       /* This program makes sense even n is very large */
        scanf("%zu", &n); /* n must not be negative */
        int data[n];    /* VLA: we need not care freeing the
object */
        /* do something with data */

}

Or, in C++:

template<size_t n> void func_templ() {
        int data[n];
        /* do something with data */
        if(n) func_templ<n - 1>() /* recursive template
instantiation */

}

Will it cause any trouble. I know that, writing code that does not leak is
VERY difficult in C. (In C++, the objects allocates memory only in the
constructor and deallocates memory only in the destructor so the problem is
not so serious). Stack allocations (especially C99 VLAs) free the
programmers ever from mallocking memory manually so it seems to be a good
idea. Also, I like setting the stack size to unlimited to prevent stack
overflow ever from happening. Is all these a good idea?

While I agree that for many users that writing code that does not leak
is difficult (and you'll likely get responses from the regulars here
that it's not difficult, as it's usually a matter of perspective and
experience), it is not necessarily so unmanageable that one regresses
to the stack for everything. For dynamic data structures, like auto-
resizing strings, arrays, linked lists, trees, etc..., using the heap
can be the best solution even though there are the associated risks of
using them. Managing the memory leak risk requires a commitment to
learn the tools that help you determine leaks, to exercise your code
in ways that may lead to memory leaks, and to learn the C programming
styles and idioms needed to avoid such leaks. This is a long process
and one area that I'm still improving.

Memory debuggers are a great tool in any programmers toolbox. They
provide invaluable first glances if you are making simple memory
management errors, like forgetting a free, or not properly cleaning up
a dynamic data structures. It does require some effort to check out
various memory debuggers, but the time spent learning the tool will be
quickly regained in discovering or troubleshooting common allocation
errors. I've been using dmalloc myself and have been mostly pleased
with it so far.

Unfortunately, the memory debugger is often only as good as the code
it is run on. This implies that memory leaks/errors pop up when your
program has unexpected errors. These problems often originate from
buffer overflows, running out of resources, invalid or too large
input, programming errors, and more. It can be very tedious to
exercise a function or an interface in all the boundary cases, and
even then you will likely miss some. If you expose your code to other
people, they will use it in ways you did not expect, and all of a
sudden you have more errors that you didn't see the last N times.

And yet, even with all these issues, people still use the heap because
it is at times, the best solution for the problem at hand. One of the
main benefits of dynamic allocation is that the space efficiency can
be much higher than using the stack, as you allocate only the space
that you need for the time needed. The other main benefit is that you
have more control of the lifetime of an object, rather than
restricting it to the scope of a function or block.

I liken C programming to using a power tool without the safety on.
You may get nicks and bruises from using it, and in the beginning some
very frustrating times (I can remember the torment inflicted on me the
first time I needed to write a linked list of linked list containers
in C for a class), but learning to endure through those times will
give you the ability to effective use and maybe even enjoy C.
Learning to use malloc/free in a safe way is a long and sometimes
difficult journey, but the benefit of having the tool and knowing how
to use it *is* worth it if you plan to spend a long enough time using
C.

Best regards,
John D.
 
K

Keith Thompson

Paavo Helde said:
Wow, infinite Turing machine! Drop me a note when you have found one! :)

The stack overflow handler prints out a purchase order for more memory
and waits for you to install it. (I didn't say it was quick.)

Address space limitations can be resolved by ... handwaving ... hey,
look over there!
 
S

Seebs

Address space limitations can be resolved by ... handwaving ... hey,
look over there!

Consider the host contamination checking implementation I suggested for
a cross-compilation system:

HOST CONTAMINATION CHECKS:
... Hey, look! A bear!
... No host contamination found.

Sadly, even a very careful analysis of the benefit vs. implementation time
turned out not to favor this approach. Maybe I'll resubmit it in a little
over two weeks.

-s
 
N

Nick Keighley

Additionally,
res_construct() can participate in an even-higher level _init() routine.

isn't _init() in a reserved namespace? Or close to one anyway.
 
N

Nick Keighley

[...] I know that, writing code that does not leak is
VERY difficult in C. (In C++, the objects allocates memory only in the
constructor and deallocates memory only in the destructor so the problem is
not so serious).

whilst this is good practice C++ does not compel you to do this
 
C

chrisbazley

isn't _init() in a reserved namespace? Or close to one anyway.


isn't _init() in a reserved namespace? Or close to one anyway.

No, I'm pretty sure that identifiers starting with an underscore are
only reserved if the following letter is another underscore or a
capital letter.

I recently modified a library of mine to stop using reserved
identifiers, having long ago adopted the idiom of prefixing all my
struct tags with '_' and using the same name without this prefix for
the corresponding types.

It was then that I discovered only my structs with capitalised tags
were infringing reserved namespace. Probably whoever wrote the code I
mimicked was aware of this, but I wasn't, which shows the danger of
adopting conventions without understanding them!
 
E

Eric Sosman

No, I'm pretty sure that identifiers starting with an underscore are
only reserved if the following letter is another underscore or a
capital letter.

For C, "All identifiers that begin with an underscore are
always reserved for use as identifiers with file scope in both
the ordinary and tag name spaces." (7.1.3p1). So you could use
`_init' inside a function, but not as a name for anything outside
a function. In particular, since "a higher-level routine" would
necessarily be a file-scope entity, naming it _init would encroach
on reserved space.

I don't know whether C++ rules are the same.
 
N

Nobody

Will it cause any trouble. I know that, writing code that does not leak is
VERY difficult in C. (In C++, the objects allocates memory only in the
constructor and deallocates memory only in the destructor so the problem is
not so serious). Stack allocations (especially C99 VLAs) free the
programmers ever from mallocking memory manually so it seems to be a good
idea. Also, I like setting the stack size to unlimited to prevent stack
overflow ever from happening. Is all these a good idea?

It's possible to write C code which doesn't leak. And in the cases where
you could use a stack-based array (i.e. where you only need the memory
for the duration of the function), it isn't very difficult to avoid leaks.

Where avoiding leaks is hard is where you return pointers to dynamically
allocated memory and then have to keep track of whether or not it's being
used. But you can't use a stack-based array in that situation anyhow.

The main reason to use the stack is performance: fixed-sized arrays are
allocated for free along with the rest of the stack frame, while alloca()
(or a C99 VLA) may end up as a single instruction.

If you're doing e.g. simple processing on small-ish strings, allocating a
buffer with malloc() may take more time than the actual algorithm. OTOH,
if the array is large or the processing is complex, the time taken by
malloc() and free() is likely to be negligible.
 
E

Ersek, Laszlo

isn't _init() in a reserved namespace? Or close to one anyway.

None that I would know of. (I meant XXX_init(), for some resource type
called "struct XXX" -- I didn't mean "_init" verbatim, without any
non-empty prefix. Sorry for being vague.)

lacos
 
R

red floyd

     For C, "All identifiers that begin with an underscore are
always reserved for use as identifiers with file scope in both
the ordinary and tag name spaces." (7.1.3p1).  So you could use
`_init' inside a function, but not as a name for anything outside
a function.  In particular, since "a higher-level routine" would
necessarily be a file-scope entity, naming it _init would encroach
on reserved space.

     I don't know whether C++ rules are the same.

17.4.3.1.2/1
-- Each name that contains a double underscore (__) or begins with an
underscore followed by an uppercase letter is reserved to the
implementation for any use.

-- Each name that begins with an underscore is reserved to the
implementation
for use as a name in the global namespace.
 
J

jl_post

For example, if we allocate objects like this:

void foo(void) {
        static const size_t size = 10000000; /* ten million */
        int data[size]; /* data is a very large object */
        /* do something with data */
}


I believe that you should always allocate on the stack -- if you
can get away with it. Unfortunately, declaring a huge object on the
stack doesn't always work, as there are pre-set stack-size limits.

Fortunately, to do what you want (that is, allocate a large array),
you can use a std::vector, like this:


void foo(void)
{
const size_t size = 10000000; // ten million
std::vector<int> data(size);
// do something with data
}


Technically, the data "object" will be on the stack, but the data's
"array" should be on the heap. And since we're technically using a C+
+ object with a proper destructor, there's no need for us to clean up
the ints that are allocated -- it will automagically be done for us at
the end of its scope. (So we get the best of both worlds.)

I hope this help, Michael.

-- Jean-Luc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top