Avoiding recursive stack overflow in C on Unix/Linux?

B

boltar2003

Where recursion becomes an issue is when you use it for every element of
a sequential data structure. For instance, if a parser's algorithmic
structure were something like:

read_first_token
process_token
recurse(rest of document)

it would probably run into a stack limit on most implementations when
processing any real input.

Some language interpreters internally recurse along with any recursing in the
program they're interpreting. Awk is a good example and will happily crash
with a SIGSEGV due to recursion. eg:

BEGIN {
blah()
}

function blah()
{
blah()
}


B2003
 
K

Kleuskes & Moos

I'd be intrigued to see a non-recursive implementation of the code of a
library for creating, reading into memory and printing JSON data
structures as an obvious example.

Not that difficult. I suspect most (stable) JSON versions are non-
recursive and (like many other parsers and parser generators such as
flex), employ a FSM instead.

For good reason, too, as William Ahern points out.
 
K

Kleuskes & Moos

And instead you run out of heap.

Aye. But there's portable ways of checking that: malloc will just
return NULL, which can be handled nicely and safely, instead of the
system throwing a SIGSEGV.
 
C

Casper H.S. Dik

Aye. But there's portable ways of checking that: malloc will just
return NULL, which can be handled nicely and safely, instead of the
system throwing a SIGSEGV.

As other have pointed out, it is possible to catch this and properly
terminate (the same when you run out of stack)

Casper
--
 
K

Kleuskes & Moos

As other have pointed out, it is possible to catch this and properly
terminate (the same when  you run out of stack)

Ah... That's news. I was under the distinct impression that stack-
overflows are impossible to catch. Please expound on your method to
reliably (and portably) catch stack-overflows in recursive algorithms.

I'm quite curious.
 
N

Nobody

I'd be intrigued to see a non-recursive implementation of the code of a
library for creating, reading into memory and printing JSON data
structures as an obvious example.

"Real" parsers are seldom recursive, but instead use an explicit stack
(pushdown automaton). Recursive-descent parsers tend to be used by people
who are unfamiliar with existing practice and just used the first approach
they thought of.

The main drawbacks of recursive parsers are:

1. They need hacks to avoid infinite recursion when dealing with
left-recursive grammars.
2. They're "pull" orientated; once they start reading an entity, they
don't return until they've finished.

For anything dealing with networking, point #2 favours the automaton
approach, as you can push tokens into the parser as and when they arrive
over the network, and get on with something else (e.g. redraw) when no
more data is available. To obtain a similar interface, a recursive parser
needs to be turned inside-out, using continuations, threads (coroutines),
etc.

OTOH, actually processing the subsequent parse tree naturally lends itself
to a recursive algorithm. Of course, any such algorithm can be converted
to a non-recursive algorithm, but the result will typically resemble
machine code (i.e. with explicit "call" and "return" operations), which
largely defeats the point of having high-level languages.
 
N

Nobody

Where i work, code like that is likely to get you fired on the spot.

OTOH, in functional languages (ML, Haskell, etc) it's the correct
approach. Iteration is just tail recursion; it's the compiler's job to
optimise it.

Such languages don't have iteration primitives, and any standard
functions for performing iteration are written using recursion.
 
N

Nobody

Not necessarily... or rather, not exactly. Take a red-black tree
implementation as an example. You can recurse, you can keep a manual stack,
or you can use parent pointers. The beauty of the last option is that once
you've allocated an object there's no possibility for collateral insertion
failure (i.e. out-of-memory). If the sibling and parent pointers are members
of the application object, then you can guarantee tree insertion.

OTOH, if the parent pointers are part of the object, the object can only
be inserted into a single tree (or, at least, a fixed set of trees).
 
D

Dr Nick

Kleuskes & Moos said:
Not that difficult. I suspect most (stable) JSON versions are non-
recursive and (like many other parsers and parser generators such as
flex), employ a FSM instead.

For good reason, too, as William Ahern points out.

It's not so much the parser, it's the data store and the output.

suppose you have the following sort of interface:

object *new_atom(char *contents);
object *new_list(void);
object *new_array(void);
void add_to_list(object* destination, object *source);
void add_to_array(object* destination, char *name, object *source);
void print_object(object *item);

How do you code print_object in a non-recursive way? Remember, we might
have done something like this:

object *s1 = new_atom("hello");
object *s2 = new_atom("world");
object *l1 = new_list();
object *l2 = new_list();
add_to_list(l1,s1);
add_to_list(l1,s2);
add_to_list(l2,s2);
add_to_list(l2,s1);
add_to_list(l2,l1);
object a1 = new_array();
add_to_array(a1,"hi",s1);
add_to_array(a1,"simple",l1);
add_to_array(a1,"notsosimple",l2);
add_to_array(a1,"what",s2);
print_object(a1);

Somewhere you are going to need a queue. I'm far from convinced
(particularly given the way many modern machines act when memory gets
tight; even if you turn off optimistic allocation, many will grind to a
halt thrashing the swap months before malloc returns NULL) that catching
malloc failures in a manual queue is any better than catching a signal
from the stack filling up.
 
D

Dr Nick

Nobody said:
OTOH, actually processing the subsequent parse tree naturally lends itself
to a recursive algorithm. Of course, any such algorithm can be converted
to a non-recursive algorithm, but the result will typically resemble
machine code (i.e. with explicit "call" and "return" operations), which
largely defeats the point of having high-level languages.

Indeed. Parsing is a red herring here. It's how you represent and then
how you traverse a naturally recursive data structure, such as pretty
well any modern language has.
 
K

Kleuskes & Moos

OTOH, in functional languages (ML, Haskell, etc) it's the correct
approach. Iteration is just tail recursion; it's the compiler's job to
optimise it.

True, and you can add Prolog to the list. But we're discussing 'C'
IIRC, and my remark was made in that context.
Such languages don't have iteration primitives, and any standard
functions for performing iteration are written using recursion.

Correct.
 
N

Nobody

Aye. But there's portable ways of checking that: malloc will just
return NULL, which can be handled nicely and safely, instead of the
system throwing a SIGSEGV.

You forgot a case: malloc() succeeds but the process receives SIGKILL when
it actually tries to use the allocated memory (OOM-killer).

This can be prevented by disabling overcommit, but on some systems the
cost of doing so is unacceptably high, i.e. you may need to provide ten
times (or worse) as much swap space as will ever be used in practice.
 
K

Keith Thompson

Nobody said:
You forgot a case: malloc() succeeds but the process receives SIGKILL when
it actually tries to use the allocated memory (OOM-killer).

Actaully the OOM-killer kills *some* process, not necessarily the one
that just tried to use allocated memory.
 
N

Nobody

Indeed. Parsing is a red herring here. It's how you represent and then
how you traverse a naturally recursive data structure, such as pretty
well any modern language has.

An interpreted programming language may still use a "hand-rolled" call
stack, as it can simplify the implementation of features such as
exceptions and continuations.

More mundane processing of recursive structures[1] is likely to just use
recursive function calls.

[1] And also divide-and-conquer algorithms on linear structures, e.g.
quicksort.
 
K

Kleuskes & Moos

You forgot a case: malloc() succeeds but the process receives SIGKILL when
it actually tries to use the allocated memory (OOM-killer).

This can be prevented by disabling overcommit, but on some systems the
cost of doing so is unacceptably high, i.e. you may need to provide ten
times (or worse) as much swap space as will ever be used in practice.

Spot on. I hadn't thought of that one.

<reads up on OOM-Killer>

However, if i understand http://linux-mm.org/OOM_Killer correctly
(interesting read, BTW), processes may be killed, but not with a
SIGSEGV, so the program does not crash, but is terminated by the
system. This may not be the desired outcome, but it's not a bug.

And as K. Thompson points out, it may actually kill a different
process, based on the result of badness() and the Law of Least
Surprize.

In any case, malloc should still return NULL if allocation fails, and
the manpage says it does. That processes may be terminated by the
system as a consequence of that call is another matter entirely.
 
M

Michael Press

William Ahern said:
Not necessarily... or rather, not exactly. Take a red-black tree
implementation as an example. You can recurse, you can keep a manual stack,
or you can use parent pointers. The beauty of the last option is that once
you've allocated an object there's no possibility for collateral insertion
failure (i.e. out-of-memory). If the sibling and parent pointers are members
of the application object, then you can guarantee tree insertion.

The downside is that maintaining parent pointers is more of a headache than
a simple stack, and also most of those parent pointers are wasting memory,
because you only required as many as were necessary to maintain a stack for
the particular operation.

I thought of that, but technically it is not a tree;
and I mean to stand on that technicality. It needs
O(depth of the tree) storage to traverse a tree.
I prefer all of my data structures to be free from resource acquisition
failures. This makes RAII more convenient in C.

My point is, you can roll the stack into whichever data structure you're
working on, and a linked-list stack can be far more informative than an
array stack (manual or implicit). Sometimes iterative algorithms like this
are easier to comprehend. I'm writing a non-recursive implementation of
left-leaning red-black trees, and insertion in particular seems much simpler
in its iterative form.

This means that you allocate all memory necessary
before beginning to build the data structure? And
never need to allocate more memory?
 
M

Michael Press

Kleuskes & Moos said:
Not that difficult. I suspect most (stable) JSON versions are non-
recursive and (like many other parsers and parser generators such as
flex), employ a FSM instead.

A FSM has a hard coded bound on depth.
Can you make these parser generators
admit to reaching their limit by asking
them to parse input that goes too deep?
 
B

Barry Margolin

Some language interpreters internally recurse along with any recursing in the
program they're interpreting. Awk is a good example and will happily crash
with a SIGSEGV due to recursion. eg:

BEGIN {
blah()
}

function blah()
{
blah()
}


B2003

That's not recursing in the parser, that's recursing in the interpreter.
It't not surprising that a script with infinite recursion might cause
infinite recursion in the interpreter.

And even if the interpreter didn't recurse when the script does, it
still has to keep the script's stack somewhere. So if it doesn't blow
out the interpreter's stack, it will run out of VM eventually.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,102
Messages
2,570,645
Members
47,246
Latest member
TemekaLutz

Latest Threads

Top