Avoiding recursive stack overflow in C on Unix/Linux?

Joshua Maurice · May 10, 2011

It is better design to decouple the algorithm from the result
processing. If nothing else, it gives you a reusable algorithm. Better
still if this can be done with little or no overhead. Overhead is often
seen as a blocker in C, but it isn't in C++ which is one reason C++ has
an algorithms library and C doesn't.

I'm not sure I'd put it that way. It's just that in C++, you can write
low-level reusable algorithms which do not carry overhead compared to
hand rolling, such as a binary search, as opposed to C where you
cannot (barring macros and god do I hope that no one uses macros for
that purpose). Templates are a wonderful thing (as hacky and unwieldy
as they are).

Dr Nick · May 11, 2011

Michael Press said:
Students are set exercises. The purpose of an exercise
is to get stronger. Sometimes binary search is offered.
The idea is to discover the loop invariant. Eventually
the student learns to write for the edge and corner cases
_first_. Thus a purpose of the exercises is realized.
I am surprised at how many people here admit to not
knowing how to write a binary search.

I didn't say I didn't know how to write a binary search. I said "it's
surprisingly difficult". I've written binary searches in machine code,
C and higher-level languages. We're not talking about students here:
we're talking about programmers for whatever reason wanting (needing?)
the quickest and easiest route to high-performance and maintainable
code.

Just because my great-great-great-grandfather had to cut peat for his
fire doesn't mean I can't use gas you know.

The idea is that in writing a binary search in place
one avoids having to write an interface to a library
routine. I embed bits of code in the search routine
itself, rather than interpreting the library routine
results and then doing what I need.

I cannot see myself ever calling bsearch(3). I will
call qsort(3). Note the existence of qsort_r.
Sometimes I'll write an insertion sort with binary
search. Sometimes I'll write a heapsort. This stuff
is not hard.

So you actually agree, you just draw the line between qsort and
bsearch. That's fine - everybody draws it somewhere. It's just nice to
see you actually agree.

Michael Press · May 11, 2011

Dr Nick said:
I didn't say I didn't know how to write a binary search. I said "it's
surprisingly difficult".

Relative to what? All it takes is knowing the loop
invariant. If a guy does not know it he has to work at
it until he does. Until then it is impossible for him
to write a binary search. When he knows it, then it
writes itself.

I've written binary searches in machine code,
C and higher-level languages. We're not talking about students here:
we're talking about programmers for whatever reason wanting (needing?)
the quickest and easiest route to high-performance and maintainable
code.

Just because my great-great-great-grandfather had to cut peat for his
fire doesn't mean I can't use gas you know.

That is not what I am saying, and I do not see
why you say I am. I am saying that a locally
written code can be adapted to and integrated
with the task at hand. You better know about
cutting peat if you want to brew whisky.

So you actually agree, you just draw the line between qsort and
bsearch. That's fine - everybody draws it somewhere. It's just nice to
see you actually agree.

I could dash off qsort. Its only advantage is speed.
When I want an O(n.log n) search I write heapsort.

By the way, qsort(3) has an irremediable security hole.

Michael Press · May 11, 2011

Ian Collins said:
Which is what the C++ search functions do.

_You_ must _write_ the callback routine,
or am I mistaken here? It's true in C.

Doesn't the search have to complete before it has a result?

The result is present _in_ the binary search code
that I just wrote out. I can put the search result
action _in_ the binary search code. I am beginning
to think you think I am trying to write a replacement
for the library routine, despite everything I have
said that implies otherwise.

Indeed, I was hung up on binary_search, the C++ library also includes
upper and lower bounds as well as equal range search functions.

It is better design to decouple the algorithm from the result
processing. If nothing else, it gives you a reusable algorithm. Better
still if this can be done with little or no overhead. Overhead is often
seen as a blocker in C, but it isn't in C++ which is one reason C++ has
an algorithms library and C doesn't.

Despite all its advantages I remain uninterested in C++.

Keith Thompson · May 11, 2011

Michael Press said:
By the way, qsort(3) has an irremediable security hole.

Can you expand on that? Is it something Unix-specific, or does it
affect any C implementation (note the cross-post).

(I presume you're aware that qsort() needn't use the Quicksort
algorithm.)

Dr Nick · May 11, 2011

pete said:
I like to write sort functions with a qsort interface.

http://www.mindspring.com/~pfilandr/C/q_sort/

There's a flaw in qsort that I make sure I don't put in my qsort-like
functions: it needs to take another void pointer and pass it to the
comparison function.

robertwessel2 · May 11, 2011

[...]

By the way, qsort(3) has an irremediable security hole.

Click to expand...

Can you expand on that? Is it something Unix-specific, or does it
affect any C implementation (note the cross-post).

(I presume you're aware that qsort() needn't use the Quicksort
algorithm.)

I assume he means that Quicksort can go quadratic, and carefully
(mis)designed* input can be an effective denial of service attack. As
you point out, of course, qsort() doesn't need to implement Quicksort
(although most still do, with some having moved to Introsort).

*Musser's original paper on Introsort is worth reading just for the
section on the (surprising easy) automatic generation of worst case
sequences for the common variants of Quicksort.

Nick Keighley · May 11, 2011

The only proper way to avoid stack-overflows is to prevent it from
happening in the first place. If at all possible (and frequently it
is) avoid recursion,

seems a bit strong. You can always avoid explicit recursion if you
want to but it often makes the code ugly.

You could track the recursion depth

if it's unavoidable, as it sometimes is, make

Nick Keighley · May 11, 2011

Dr Nick said:
Dr Nick said:

[...]

Where recursion becomes an issue is when you use it for every element of
a sequential data structure. For instance, if a parser's algorithmic
structure were something like:
read_first_token
process_token
recurse(rest of document)
it would probably run into a stack limit on most implementations when
processing any real input.

Click to expand...

Click to expand...

Most implementations where the compiler doesn't optimise tail recursion
away, anyway.

Click to expand...

Eh ... you do understand that 'compiler detects that programmer was a
crackpot and works around that automatically' implies that recursion
is probematic, do you?

some languages don't provide any looping form except recursion.

boltar2003 · May 11, 2011

some languages don't provide any looping form except recursion.

Usually languages that seem like a good idea when sitting at the top of
an ivory tower smoking some weed.

B2003

James Kuyper · May 11, 2011

_You_ must _write_ the callback routine,
or am I mistaken here? It's true in C.

You don't always need to write a callback; one may already be available.
For each of the binary search function templates (lower_bound,
upper_bound, equal_range, and binary_search) there is an overload which
requires no comparison function object. That overload uses '<' to
perform the comparison. This works for any element type which is:
a) a built-in type for which '<' is already provided
b) a C++ standard library class for which operator<() is already provided
c) a user-defined class for which you have already overload operator<()
for some other purpose, so there's no additional cost to also using it
in the binary search.

But you're right, in general: you do need to write a comparison
function; the most reasonable way to do it is write one that is an
operator<() overload, taking advantage of the feature described above.

That's not always feasible. I recently had a need for a sequence of
elements which was simultaneously sorted by two different fields; the
sort order was required to be the same for both fields. Insertions of
elements for which that requirement could not be met was disallowed. For
that purpose, I needed two different comparison functions, one for each
field. Only one of them could be operator<(); the other would have to be
a separate comparison function.

Rainer Weikusat · May 11, 2011

[ binary search ]

It is better design to decouple the algorithm from the result
processing. If nothing else, it gives you a reusable algorithm.

The algorithm is something abstract and as such, inherently reusable
(although the practical usefulness of an algorithm may be limited
outside the problem situation it was intended to help with): Provided
you need to write code which has to search for something in a sorted
array, you can use the reusable 'binary search' algorithm instead of
inventing your own searching algorithm ('binary search' is a really
bad example here because it is so absolutely trivial to implement). It
is also possible to create a reusable implementation of an algorithm,
which amounts to exchanging the problem of implementing the algorithm
for the problem of dealing with an external dependency whose exact
behaviour cannot be determined by inspecting the source code of a
particular program alone, whose exact behaviour may change independently
of the code of the the programs using it, whose implementation isn't
necessarily particularly well-suited for solving the specific problem
at hand and whose implementation may now (or at any time in future)
contain errors independent of the source code of any program using it.
In nutshell, the means such a
program-using-the-reusable-implementation-of-a-reusable algorithm can
possibly/ probably be written faster but it becomes harder to debug
and maintain in exchange for that. Except if the functionality gained
in this way is non-trivial, this is usually a bad tradeoff since
debugging and maintaining code is generally more difficult and
time-consuming than writing it.

I recently spent roughly a complete work day trying to 'decrypt'
Ulrich Drepperts custom macro orgies inside glibc in order to
determine what code is actually being executed by a program which
failed inside the getpsnam library routine before I was able to
identify the actual problem and what was causing it and fix that which
provides a nice example of the maintenance problem I wrote about in
the first paragraph. And I'd wager a bet that a majority of
'developers' won't even contemplate to try to determine why some
seemingly correct code fails inside a library call but just stack
workaround onto workaround in order to mitigate the practical
consequences whenever the reliability problems caused by the actual
error happens to reach them (and this is bound to be much less often
than the problem affecting a user of the code in some significant
way).

Better still if this can be done with little or no overhead.

Computers are really fast nowadays and avoiding the inherent overhead
of 'the most general "optimized" solution we were able to find'
usually turns 'performance problems' into non-issues: Only heavy users
of insanely huge and complex third-party libraries are still affected
by that (this is a somewhat hyperbolic statement with a true core).

Casper H.S. Dik · May 11, 2011

I like to write sort functions with a qsort interface.

http://www.mindspring.com/~pfilandr/C/q_sort/

There is no rule that qsort() is actually a quick sort.

Casper
--

Casper H.S. Dik · May 11, 2011

There's a flaw in qsort that I make sure I don't put in my qsort-like
functions: it needs to take another void pointer and pass it to the
comparison function.

I've never needed that argument; what use do you have for that argument?

Casper
--

James Kuyper · May 11, 2011

I've never needed that argument; what use do you have for that argument?

The simplest example I can come up with is:

int compare_element_n(void*first, void*second, void*args)
{
size_t n = *(size_t*)args;
int *left = first;
int *right = second;
return left[n] < right[n] ? -1 : right[n] < left[n] ? 1 : 0;
}

Example of use:
int array[1024][3]; /* Array of 3-d positions. */
int n;

/* Sort by x coordinate: */
n = 0;
new_qsort(array, 1024, sizeof array[0], compare_element_n, &n);

/* Sort by z coordinate: */
n = 2;
new_qsort(array, 1024, sizeof array[0], compare_element_n, &n);

James Kuyper · May 11, 2011

On 05/11/2011 10:03 AM, James Kuyper wrote:
....

int compare_element_n(void*first, void*second, void*args)
{
size_t n = *(size_t*)args;
int *left = first;
int *right = second;
return left[n] < right[n] ? -1 : right[n] < left[n] ? 1 : 0;
}

Example of use:
int array[1024][3]; /* Array of 3-d positions. */
int n;

That should, of course, have been "size_t n;". I changed types partway
through writing this message, and didn't correct all of them to match.

/* Sort by x coordinate: */
n = 0;
new_qsort(array, 1024, sizeof array[0], compare_element_n, &n);

Keith Thompson · May 11, 2011

Usually languages that seem like a good idea when sitting at the top of
an ivory tower smoking some weed.

And sometimes they seem like a good idea to people who get actual work
done using them.

boltar2003 · May 11, 2011

And sometimes they seem like a good idea to people who get actual work
done using them.

There are zero good reasons good reasons for having a language with no
looping constructs other than recursion and no state. A language like that
merely satisfies the quirky asthetics of certain mathematically minded
academics and it serves no more purpose than if it had looping constructs
included - its just assumed to be somehow more pure and therefor more worthy
as a language. Which is drivel.

Functional programming is generally thought to be a nice idea by people who
never have to write programs for live enviroments that work on real machines
in real time with real limits on memory and CPU. And yes I am aware of Erlang
and no I wouldn't piss on it if it was on fire whether ericsson still use it
in their switches or not.

B2003

Dr Nick · May 11, 2011

Casper H.S. Dik said:
I've never needed that argument; what use do you have for that argument?

To avoid having to write lots of comparison functions: suppose you are
sorting an array of structures with three fields and want to be able to
have any of them as primary, secondary and tertiary sort field and to
sort ascending or descending on each.

Either you write 8 comparison functions, and some nifty code to select
them (which would be quicker, true) or you need to pass some extra data
into your comparison function.

Michael Press · May 12, 2011

Dr Nick said:
There's a flaw in qsort that I make sure I don't put in my qsort-like
functions: it needs to take another void pointer and pass it to the
comparison function.

You are talking about qsort_r(3)?

C/Ubuntu ][ Problem in Buffer Overflow logic	1	Dec 12, 2008
checking for stack overflow in recursive function	4	Sep 22, 2003
Recursion crash in STL on linux	14	Apr 27, 2010
The machine stack and the C language	122	Jan 12, 2008
C language now truly universal	0	Jan 1, 2011
Installing a module written in C on a UNIX without root access	5	Jan 3, 2009
In the Matter of Herb Schildt: a Detailed Analysis of "C: TheComplete Nonsense"	109	Apr 3, 2010
Coding for a USB joystick in C++ on a Redhat linux machine	2	May 21, 2007

Avoiding recursive stack overflow in C on Unix/Linux?

Joshua Maurice

Dr Nick

Michael Press

Michael Press

Keith Thompson

Dr Nick

robertwessel2

Nick Keighley

Nick Keighley

boltar2003

James Kuyper

Rainer Weikusat

Casper H.S. Dik

Casper H.S. Dik

James Kuyper

James Kuyper

Keith Thompson

boltar2003

Dr Nick

Michael Press

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads