Avoiding recursive stack overflow in C on Unix/Linux?

P

Phil Carmody

You must be sure that hostile input can't turn the worst case into the
average.

I'm aware of sentences that make sense, and that one doesn't.
I would say "try again", but I would't mean it.

Phil
 
I

Ilya Zakharevich

For what it's worth, if you expand your horizons outside of just Unix
and Linux, things are rather different. For one thing, the problem that
you want to avoid has not "already happened" on systems such as Win32
and OS/2 when the exception that one has to handle is raised. This is
because on such systems the decommitted stack scheme relies upon all
functions performing stack probes. Stack probes happen in the function
perilogue, at the beginning before the meat of the function begins
execution, and it is during those probes that the exceptions will be
raised, as guard pages and non-stack areas are referenced by the probing.

What you describe has nothing to do with the OS, but with particular
compilers only (at least on OS/2).

Hope this helps,
Ilya
 
R

Rainer Weikusat

Jonathan de Boyne Pollard <[email protected]>
writes:

[...]
My own programs are allowed to assume [...] a correspondingly early
version of POSIX, [...]
There's that narrow horizon right there, in your very own words.

Indeed. The 'narrow horizon' of an IEEE standard, compared to the wide
world of the single company (mostly accurate) which usually doesn't
bother complying to standards it didn't invent itself. This, of
course, not only being total nonsense but also crossposted to four
different groups and being off topic everyhwere because it is
essentially advocacy or - more accurelaty - a fan with an 'obsessive
compulsive' habit of trying to proselytize among the heathens.

F'up2 in appropriate group set. Don't reply to this posting.
 
K

Keith Thompson

Phil Carmody said:
I'm aware of sentences that make sense, and that one doesn't.
I would say "try again", but I would't mean it.

It seemed clear enough to me, though perhaps a bit awkwardly phrased.

Quicksort is O(n log n) in the average case, O(N^2) in the worst
case. But that "average case" assumes random input. If Quicksort
is running in an environment where an attacker can feed it carefully
crafted input, its average case can become O(N^2), where the averate
is computed over the set of inputs it actually receives. This can
be used as a denial-of=service attack, which can, in some cases,
constitute a security hole.
 
B

Ben Bacarisse

Keith Thompson said:
It seemed clear enough to me, though perhaps a bit awkwardly phrased.

Quicksort is O(n log n) in the average case, O(N^2) in the worst
case. But that "average case" assumes random input. If Quicksort
is running in an environment where an attacker can feed it carefully
crafted input, its average case can become O(N^2), where the averate
is computed over the set of inputs it actually receives. This can
be used as a denial-of=service attack, which can, in some cases,
constitute a security hole.

This, too, seems awkward to me and it perpetuates a misunderstanding
about Big O -- it's not a lower bound, only an upper bound. Carefully
crafted input can't make quicksort become O(n^2) because it already is.
It's also O(n^3).

The simplest fix is just to say that it is "no longer O(n log n)" when
the cost is taken over all these "bad" inputs. Picking a new, higher,
upper bound does not help express the meaning.

In day-to-day chat, O(f) has acquired a meaning much more like Theta(f)
but, just as you don't like "implicit casts" it seems to me nonsensical
to use it in this loose way. If you translate to a literal bound you
can see how odd it sounds: "my sort takes less than a minute to run on
average, but with malicious input it can be made to take less than a
year".
 
B

Barry Margolin

Keith Thompson said:
It seemed clear enough to me, though perhaps a bit awkwardly phrased.

Quicksort is O(n log n) in the average case, O(N^2) in the worst
case. But that "average case" assumes random input. If Quicksort
is running in an environment where an attacker can feed it carefully
crafted input, its average case can become O(N^2), where the averate
is computed over the set of inputs it actually receives. This can
be used as a denial-of=service attack, which can, in some cases,
constitute a security hole.

If an attacker wants to make your system spend huge amounts of time
sorting, he can simply feed it arbitrarily large inputs. Regardless of
the algorithm, more input means more CPU time used.

So it's kind of silly to worry about an attacker when designing
something like this, and more useful to design the algorithm around the
expected normal input.
 
K

Keith Thompson

Barry Margolin said:
If an attacker wants to make your system spend huge amounts of time
sorting, he can simply feed it arbitrarily large inputs. Regardless of
the algorithm, more input means more CPU time used.

So it's kind of silly to worry about an attacker when designing
something like this, and more useful to design the algorithm around the
expected normal input.

(I don't remember who pointed this out, but yes, in an earlier
article I was sloppy in my use of big-O notation.)

It's probably easier to guard against unreasonably large inputs.
(A slow network connection will do the job nicely.) It's harder
to guard against attacks where a large input causes the program
to consume vastly more resources than another input of exactly the
same size.

If I were designing a program that needs to sort arrays whose
contents are not under my control, I'd avoid using an algorithm
whose run-time behavior can vary tremendously depending on the
nature of the input -- not just because of the possibility of
malicious attacks, but because even random data could conceivably
cause the same problem. Even if the problem is unlikely to occur,
it's good to have one less thing to worry about.

If this were difficult to do, I'd stop and consider whether it's
worth the effort, but it shouldn't be. Though a naive Quicksort
algorithm does suffer from this problem, my guess is that most,
perhaps all, library sorting algorithms avoid it, either by tweaking
Quicksort or by using a different algorithm.

A concrete question: Are there any existing C library implementations
of qsort() that are vulnerable to this kind of problem?
 
R

robertwessel2

(I don't remember who pointed this out, but yes, in an earlier
article I was sloppy in my use of big-O notation.)

It's probably easier to guard against unreasonably large inputs.
(A slow network connection will do the job nicely.)  It's harder
to guard against attacks where a large input causes the program
to consume vastly more resources than another input of exactly the
same size.

If I were designing a program that needs to sort arrays whose
contents are not under my control, I'd avoid using an algorithm
whose run-time behavior can vary tremendously depending on the
nature of the input -- not just because of the possibility of
malicious attacks, but because even random data could conceivably
cause the same problem.  Even if the problem is unlikely to occur,
it's good to have one less thing to worry about.

If this were difficult to do, I'd stop and consider whether it's
worth the effort, but it shouldn't be.  Though a naive Quicksort
algorithm does suffer from this problem, my guess is that most,
perhaps all, library sorting algorithms avoid it, either by tweaking
Quicksort or by using a different algorithm.

A concrete question: Are there any existing C library implementations
of qsort() that are vulnerable to this kind of problem?


Based solely on personal experience, I believe most C qsorts actually
implement some version of quicksort, and are vulnerable. A few do
introsort, and introsort is fairly common for C++ sort. I've also
seen one qsort that implemented heapsort* (which would not have the
issue). I think the issue is mainly history - most of the C
implementations have been around longer than introsort, and fall into
the if-it-ain't-broke category.


*arguably a better general purpose choice than quicksort, but that’s a
different topic
 
A

Alan Curry

(I don't remember who pointed this out, but yes, in an earlier
article I was sloppy in my use of big-O notation.)

It's probably easier to guard against unreasonably large inputs.
(A slow network connection will do the job nicely.) It's harder

That's great. You run an Internet service that handles a zillion transactions
per day, it's someone figures out how to make an algorithm degenerate to the
worst case, he brings your server farm to permanent 100% CPU usage with a
series of malicious packets that use up only .0001% of your network
bandwidth. What's your solution? Host the service on a 9600-baud modem!
That'll stop the attack! And your other users too. But they were an
unreasonably large group.

Did anybody read the Algorithmic Complexity Attacks paper before rushing to
post their ignorant opinions?

It matters a lot whether denial of service can be accomplished by an attacker
without superior network bandwidth. That's why people who make widely used
server software have responded to issues of this nature with algorithm
modifications, not with "DoS defense is hard, let's go shopping"
 
K

Keith Thompson

That's great. You run an Internet service that handles a zillion transactions
per day, it's someone figures out how to make an algorithm degenerate to the
worst case, he brings your server farm to permanent 100% CPU usage with a
series of malicious packets that use up only .0001% of your network
bandwidth. What's your solution? Host the service on a 9600-baud modem!
That'll stop the attack! And your other users too. But they were an
unreasonably large group.

Did anybody read the Algorithmic Complexity Attacks paper before rushing to
post their ignorant opinions?

Did you really think my passing remark about a slow network connection
was seriously intended as a solution to the problem?

It wasn't. It was a joke.

Sheesh.

[...]
 
R

robertwessel2

[...]
If I were designing a program that needs to sort arrays whose
contents are not under my control, I'd avoid using an algorithm
whose run-time behavior can vary tremendously depending on the
nature of the input -- not just because of the possibility of
malicious attacks, but because even random data could conceivably
cause the same problem.  Even if the problem is unlikely to occur,
it's good to have one less thing to worry about.
If this were difficult to do, I'd stop and consider whether it's
worth the effort, but it shouldn't be.  Though a naive Quicksort
algorithm does suffer from this problem, my guess is that most,
perhaps all, library sorting algorithms avoid it, either by tweaking
Quicksort or by using a different algorithm.

On re-reading this, I realize that the sentence

    If this were difficult to do, I'd stop and consider whether it's
    worth the effort, but it shouldn't be.

was unclear.  I meant that it shouldn't be difficult to do, not
that it shouldn't be worth the effort.
A concrete question: Are there any existing C library implementations
of qsort() that are vulnerable to this kind of problem?

I took a look at the glibc implementation of qsort().  It does
use Quicksort, but with some tweaks.  It picks the pivot using a
median-of-three decision tree, it falls back to insertion sort for
small segments, it uses an explicit stack rather than recursion,
and it always handles the larger partition first.  I don't know
whether this completely avoids the worst-case N**2 behavior or not,
but it should make it less likely to happen accidentally.


It does *not*. Again, I recommend Musser's original Introsort paper
for a quite interesting description of how to generate worst case
sequences for the common variants of Quicksort. It's actually
remarkably straight-forward.
 
B

Ben Pfaff

Keith Thompson said:
I took a look at the glibc implementation of qsort(). It does
use Quicksort, but with some tweaks. It picks the pivot using a
median-of-three decision tree, it falls back to insertion sort for
small segments, it uses an explicit stack rather than recursion,
and it always handles the larger partition first. I don't know
whether this completely avoids the worst-case N**2 behavior or not,
but it should make it less likely to happen accidentally.

It also uses merge sort instead of quicksort when enough memory
is available.
 
K

Keith Thompson

Ben Pfaff said:
It also uses merge sort instead of quicksort when enough memory
is available.

What I wrote above appears to be only a subset of the truth.

I grabbed the glibc sources and took a quick look at qsort.c
(mostly just skimming the comments). Had I looked more closely,
I would have realized that qsort.c doesn't actually define qsort().
qsort() is defined in msort.c; it's a wrapper around qsort_r, which
*sometimes* calls the _quicksort function that's defined in qsort.c.
 
M

Malcolm McLean

It does *not*.  Again, I recommend Musser's original Introsort paper
for a quite interesting description of how to generate worst case
sequences for the common variants of Quicksort.  It's actually
remarkably straight-forward.
You just shuffle the input. Then it becomes far more likely that the
computer will break than that you hit a worst case or even a bad case.
 
R

robertwessel2

You just shuffle the input. Then it becomes far more likely that the
computer will break than that you hit a worst case or even a bad case.


Where are we going to get the entropy to do a good shuffle? And if we
just use a deterministic shuffle, it should be trivial for the
attacker to reverse that before sending the data. Obviously
intermediate approaches are possible (use 32 bit of entropy to select
one of ~2**32 permutations, for example, possibly by using the entropy
to seed a PRNG to drive the shuffle).

And why would any of this be meaningfully easier, or faster, than just
using an introsort or heapsort instead? Doing a shuffle+quicksort
ought to be a bit faster than heapsort on small elements, assuming a
reasonably fast shuffle. Heapsort will definitely be simpler to
implement (heck, heapsort is simpler than quicksort with median-of-
three and separate small partition handling anyway). Shuffle
+quicksort will certainly be slower than introsort in the typical
cases (where introsort is identical to a straight quicksort).
Assuming the preexistence of a useable entropy source, shuffle
+quicksort ought to be slightly simpler to implement than introsort.
 
M

Malcolm McLean

Where are we going to get the entropy to do a good shuffle?
From the clock.
The attacker can still time his input so that the clock comes up with
the right number, but that's likely to be challenging. Certainly it
will be a lot harder than getting the source for Microsoft Visual C++
qsort(), and engineering an input that takes N^3 iterations to
complete.
 
R

robertwessel2

From the clock.
The attacker can still time his input so that the clock comes up with
the right number, but that's likely to be challenging. Certainly it
will be a lot harder than getting the source for Microsoft Visual C++
qsort(), and engineering an input that takes N^3 iterations to
complete.


The clock is a poor source of entropy, especially if you're using
actual time. Sure, timing is bit of a challenge, but nothing that
hasn't been done before. FWIW, the source to MS's qsort is in vc\crt
\src\qsort.c (assuming you installed source when you installed VS),
and implements a slightly modified median-of-three quicksort. And
quicksort's worst case is N**2.
 
T

Tim Rentsch

Ben Bacarisse said:
James Kuyper said:
On May 5, 11:37 am, (e-mail address removed) wrote:
Hello

If a program recurses too deeply there's always a chance that it could
run out of stack space and die with a SIGSEGV. Is there any API that can
tell you how much stack space is left or some other method to prevent this
in C/C++? I realise I could catch the signal but thats a pretty damn ugly
way to do it.

B2003

The only proper way to avoid stack-overflows is to prevent it from
happening in the first place.

And how do you do that? If the stack size is sufficiently limited, even

int main(int argc, char*argv[]) { return 0; }

could overflow the stack.

Hmm... only in the most contrived implementation.
[snip elaboration]

Not necessarily. 'The implementation' doesn't include the
hardware, or operating system, on which the translated program
will run. An implemention could produce an executable that runs
fine on one machine but has a stack overflow when run on another
machine with less memory. Yet it is the same implementation.
Cf section 1 paragraph 2 (last clause).
 
P

Phil Carmody

Keith Thompson said:
On re-reading this, I realize that the sentence

If this were difficult to do, I'd stop and consider whether it's
worth the effort, but it shouldn't be.

was unclear. I meant that it shouldn't be difficult to do, not
that it shouldn't be worth the effort.


I took a look at the glibc implementation of qsort(). It does
use Quicksort, but with some tweaks. It picks the pivot using a
median-of-three decision tree, it falls back to insertion sort for
small segments, it uses an explicit stack rather than recursion,
and it always handles the larger partition first.

Odd, in order to reduce actual recursion (and do tail recursion
instead), and to keep locality of reference tight, I'd handle the
smaller part first were I to use an actual quicksort. Not that I'd
do that, being more of a fan of heapsort in situations where I want
an acceptable worst-case big-Oh for reasons such as availability.
I don't know
whether this completely avoids the worst-case N**2 behavior or not,
but it should make it less likely to happen accidentally.

Nope, reduces the constant, that's all. If the median is selected
from 3 random (i.e. unpredicatable to an attacker) elements, then
the worst case behaviour can not be forced, and you would average
to the theoretical average. But then you have to secure information
about your (P)RNG state.

Phil
 
J

Jonathan de Boyne Pollard

I took a look at the glibc implementation of qsort(). It does use
Odd, in order to reduce actual recursion (and do tail recursion
instead), and to keep locality of reference tight, I'd handle the
smaller part first were I to use an actual quicksort.
You've made me curious enough to revisit the code of my standard
library's sorting function after many years. It does median-of-three
and falls back to insertion sort (or even just a straight immediate swap
for fewer than three elements). But, like M. Carmody, I pick the
smaller portion to tail recurse on. This is what Robert Sedgewick
does. All of these "tweaks" mentioned are discussed by Sedgewick, in fact.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,091
Messages
2,570,604
Members
47,223
Latest member
smithjens316

Latest Threads

Top