Avoiding recursive stack overflow in C on Unix/Linux?

M

Michael Press

Keith Thompson said:
Can you expand on that? Is it something Unix-specific, or does it
affect any C implementation (note the cross-post).

(I presume you're aware that qsort() needn't use the Quicksort
algorithm.)

Sorry. Meant quicksort. Quicksort can be induced to
follow its O(n^2) worst case.
 
M

Michael Press

Sherm Pendley said:
It may not be hard, but it *is* boring and well-covered
ground. I'd rather spend my time on wheels that haven't
yet been invented. :)

I am easily amused.
 
M

Michael Press

James Kuyper said:
You don't always need to write a callback; one may already be available.
For each of the binary search function templates (lower_bound,
upper_bound, equal_range, and binary_search) there is an overload which
requires no comparison function object. That overload uses '<' to
perform the comparison. This works for any element type which is:
a) a built-in type for which '<' is already provided
b) a C++ standard library class for which operator<() is already provided
c) a user-defined class for which you have already overload operator<()
for some other purpose, so there's no additional cost to also using it
in the binary search.

But you're right, in general: you do need to write a comparison
function; the most reasonable way to do it is write one that is an
operator<() overload, taking advantage of the feature described above.

That's not always feasible. I recently had a need for a sequence of
elements which was simultaneously sorted by two different fields; the
sort order was required to be the same for both fields. Insertions of
elements for which that requirement could not be met was disallowed. For
that purpose, I needed two different comparison functions, one for each
field. Only one of them could be operator<(); the other would have to be
a separate comparison function.

Right. Sometimes it is simpler
to write a dedicated routine.
 
D

Dr Nick

Michael Press said:
You are talking about qsort_r(3)?

Maybe:

~$ man qsort
[man page displays, it does not mention qsort_r]
~$ man 3 qsort_r
No manual entry for qsort_r in section 3
~$ man qsort_r
No manual entry for qsort_r

But it's still a flaw in the original design of the standard library -
any such function should take and pass on a pointer like this.
 
I

ImpalerCore

I don't use any of those 1's (ones),
when I write binary search.

I let the initial value of h
be equal to the index of the element
which is one past the end of the array.

Both good points. Thank you. I tend to forget how close 1 and l
appear.
 
M

Malcolm McLean

I've never needed that argument; what use do you have for that argument?
We've got a list of points in 3d space, representing space invaders,
and a player, who's also a point in 3d space. We need to sort the
baddies by their distance from the player as part of the AI.
There's no way of writing the comparision function without either
using a global or setting up a temporary array, both of which are
nuisances.
 
M

Michael Press

Dr Nick said:
Michael Press said:
You are talking about qsort_r(3)?

Maybe:

~$ man qsort
[man page displays, it does not mention qsort_r]
~$ man 3 qsort_r
No manual entry for qsort_r in section 3
~$ man qsort_r
No manual entry for qsort_r

But it's still a flaw in the original design of the standard library -
any such function should take and pass on a pointer like this.


void
qsort_r(void *base, size_t nel, size_t width, void *thunk,
int (*compar)(void *, const void *, const void *));

The qsort_r() function behaves identically to qsort(), except that it takes an addi-
tional argument, thunk, which is passed unchanged as the first argument to function
pointed to compar. This allows the comparison function to access additional data
without using global variables, and thus qsort_r() is suitable for use in functions
which must be reentrant.
 
D

Dr Nick

Michael Press said:
Dr Nick said:
Michael Press said:
Michael Press wrote:

I could dash off qsort. Its only advantage is speed.
When I want an O(n.log n) search I write heapsort.

By the way, qsort(3) has an irremediable security hole.

I like to write sort functions with a qsort interface.

http://www.mindspring.com/~pfilandr/C/q_sort/

There's a flaw in qsort that I make sure I don't put in my qsort-like
functions: it needs to take another void pointer and pass it to the
comparison function.

You are talking about qsort_r(3)?

Maybe:

~$ man qsort
[man page displays, it does not mention qsort_r]
~$ man 3 qsort_r
No manual entry for qsort_r in section 3
~$ man qsort_r
No manual entry for qsort_r

But it's still a flaw in the original design of the standard library -
any such function should take and pass on a pointer like this.


void
qsort_r(void *base, size_t nel, size_t width, void *thunk,
int (*compar)(void *, const void *, const void *));

The qsort_r() function behaves identically to qsort(), except that it takes an addi-
tional argument, thunk, which is passed unchanged as the first argument to function
pointed to compar. This allows the comparison function to access additional data
without using global variables, and thus qsort_r() is suitable for use in functions
which must be reentrant.

That's the badger. But re-entrancy is (as examples in this thread have
shown) far from the only reason for needing it. In fact, no-one even
suggested it as an example.
 
N

Nick Keighley

There are zero good reasons good reasons for having a language with no
looping constructs other than recursion and no state. A language like that
merely satisfies the quirky asthetics of certain mathematically minded
academics and it serves no more purpose than if it had looping constructs
included - its just assumed to be somehow more pure and therefor more worthy
as a language. Which is drivel.

Functional programming is generally thought to be a nice idea by people who
never have to write programs for live enviroments that work on real machines
in real time with real limits on memory and CPU. And yes I am aware of Erlang
and no I wouldn't piss on it if it was on fire whether ericsson still useit
in their switches or not.

ah, reasoned discussion
 
P

Phil Carmody

Måns Rullgård said:
It sometimes makes sense to allocate a large block and only use small
scattered portions of it to simplify indexing. Using only a small
_initial_ portion obviously never makes sense.

Sparse use of memory often implies serious cache abuse (at all levels,
it increases the likelyhood of swapping too, for example) and that should
be avoided.

To answer Keith, I think the true answer is that it's easier to just not
think about it, and trust that the underlying operating system works around
your unwillingness to think.

Phil
 
P

Phil Carmody

And sbrk() needs to allocate full pages to the process, so at a minimum,
a call to sbrk will fetch at least 4k bytes (on ia32/x86_64 - more on
PPC/Sparc/MIPS/ia64 where the minimum page size could be considerably larger).

But if you are sbrking more because the page size is large, then you're
not sbrking more than you need. None of the pages, all 1 of them, goes
unmapped. There's no over-commit in this scenario. The question was about
overcommit.

Phil
 
B

Barry Margolin

Phil Carmody said:
But if you are sbrking more because the page size is large, then you're
not sbrking more than you need. None of the pages, all 1 of them, goes
unmapped. There's no over-commit in this scenario. The question was about
overcommit.

Do malloc() implementations typically only grow the heap by just one
page at a time? My suspicion is that they grow by bigger chunks, to
avoid frequent system calls to get more memory. Kind of like the
programming strategy when you have a dynamically growing array, where
you double its size every time it fills up.

And I'll also bet that the initial heap size is over-committed, to avoid
a flurry of sbrk() calls when the program starts and does lots of memory
allocations.
 
P

Phil Carmody

Ian Collins said:
I wish I had the time to rewrite (including the required test cases)
standard algorithms each time I needed one.


What could be simpler than

std::binary_search( start, end, value );

I thought using iterators was in fashion a decade ago, but getting rather
stale now.

Phil
 
M

Malcolm McLean

I thought using iterators was in fashion a decade ago, but getting rather
stale now.
I stopped using C++ about then.
But the controlled sequence or iterator system does work, it's just
very hard to train programmers to use it correctly.
 
P

Phil Carmody

Michael Press said:
Sorry. Meant quicksort. Quicksort can be induced to
follow its O(n^2) worst case.

If your security if governed by the speed at which operations are
completed, you probably don't have any actual security anyway.
In particular what you is strange given that it's often the case
that in order to improve security you prefer things slower.

And getting to the root of things, since when has "a worst-case
O(n^2) algorithm may, in worst case, take O(n^2) operations" been
a security hole. Citations, rather than flim-flam, would be nice.

Phil
 
A

Alan Curry

And getting to the root of things, since when has "a worst-case
O(n^2) algorithm may, in worst case, take O(n^2) operations" been
a security hole. Citations, rather than flim-flam, would be nice.

You must be sure that hostile input can't turn the worst case into the
average. That's called an "algorithmic complexity attack". It's only denial
of service, not privilege escalation, but still... it's been a recognized
category of security issues for a few years. Look it up.
 
J

Jonathan de Boyne Pollard

[...] Where's the buzzer/warning light for approaching the stack
limits in C? The non-portable best that I can do is handling SIGSEGV,
which is a little late; the problem I wanted to avoid has already
happened by the time that signal is raised. For that matter, I'd
settle for the analog of a gas gauge - where can I find it?
For what it's worth, if you expand your horizons outside of just Unix
and Linux, things are rather different. For one thing, the problem that
you want to avoid has not "already happened" on systems such as Win32
and OS/2 when the exception that one has to handle is raised. This is
because on such systems the decommitted stack scheme relies upon all
functions performing stack probes. Stack probes happen in the function
perilogue, at the beginning before the meat of the function begins
execution, and it is during those probes that the exceptions will be
raised, as guard pages and non-stack areas are referenced by the probing.


http://homepage.ntlworld.com./jonathan.deboynepollard/FGA/function-perilogues.html#StackProbes

Specifically, what happens is that a (OS/2 or Win32, not C++) exception
is raised when a new guard page cannot be set. Microsoft KnowledgeBase
article Q315937 shows how to catch such exceptions, and the small amount
of cleanup that is necessary to handle the fact that the exception is
raised partway through the stack growth process. OS/2 has the same
mechanism, and the exception to catch is the fairly straightforwardly
named XCPT_UNABLE_TO_GROW_STACK . (Notice that on OS/2 this exception
is raised by code that is itself an exception handler, for the
XCPT_GUARD_PAGE_VIOL exception.)


http://cyberkinetica.homeunix.net/os2tk45/cp2/851_L3_XCPT_GUARD_PAGE_VIOL.html

Again, these exceptions are raised as a result of actions taken in the
function perilogue, not the function body proper. You'll note from the
function perilogues FGA page that there's a problem with some compilers
that don't properly account for the calling parameters area; but, that
poor quality compiler issue set aside, this means that (except for
alloca() and variable-length arrays) one can be assured that one won't
have to deal with this once a function's activation record has been
fully pushed onto the stack by the perilogue.

As for the "gas gauge": That's a matter of comparing the current stack
pointer with the stack bottom recorded in the Thread Information Block,
code for most of which (obtaining the amount still remaining instead of
the amount already used being a minor exercise for the reader) is on the
quite appropriately named StackOverflow, in an answer to a question on
this very subject.


http://stackoverflow.com/questions/...tack-space-with-visual-studio/1747249#1747249
 
J

James Kuyper

[...] Where's the buzzer/warning light for approaching the stack
limits in C? The non-portable best that I can do is handling SIGSEGV,
which is a little late; the problem I wanted to avoid has already
happened by the time that signal is raised. For that matter, I'd
settle for the analog of a gas gauge - where can I find it?
For what it's worth, if you expand your horizons outside of just Unix
and Linux, things are rather different.

My horizon is a lot larger than that, which is why a Windows-specific
solution is just as unhelpful to me as a Unix-specific one.
 
J

Jonathan de Boyne Pollard

[...] Where's the buzzer/warning light for approaching the stack
My horizon is a lot larger than that, which is why a Windows-specific
solution is just as unhelpful to me as a Unix-specific one.
You weren't given a Windows-specific solution. Go and read what I
wrote, but properly this time, instead of stopping after the first
sentence as you did above. Your horizons certainly in this thread so
far have most definitely been confined to Unix and Linux. You keep
going on about SIGSEGV, for starters. If your horizon were wider, you'd
not be doing so. Nor would you be bringing up as problems things that
simply aren't the case outwith the narrow confines of Unix and Linux.
And indeed, I quote you from a couple of messages back:
My own programs are allowed to assume [...] a correspondingly early
version of POSIX, [...]
There's that narrow horizon right there, in your very own words.
 
J

James Kuyper

On 05/15/2011 05:49 PM, Jonathan de Boyne Pollard wrote:
....
not be doing so. Nor would you be bringing up as problems things that
simply aren't the case outwith the narrow confines of Unix and Linux.

"outwith" => "outside"?

If it's a problem for any fully conforming versions of C, it's a problem
my code should, ideally, be able to deal with properly, regardless of
whether there's other fully conforming versions where it is not.
And indeed, I quote you from a couple of messages back:
My own programs are allowed to assume [...] a correspondingly early
version of POSIX, [...]
There's that narrow horizon right there, in your very own words.

What I'm allowed to assume for the project I'm working on, and what I'm
willing to assume in general, are two very different things.

Even on this particular project, my code actually relies very little
upon that particular assumption. That's possible primarily because one
of the third-party libraries I'm required to use hides a lot of the
implementation-specific details within it's own code. Because there's no
requirement that my code operate on non-POSIX platforms, I could not get
permission to spend any significant amount of time making sure that it
did; but if that requirement were changed, I doubt that my own code
would require many fixes.

A very small portion of my code relies upon the POSIX guarantee that
CHAR_BITS==8; most of it is carefully written to avoid making that
assumption. Those five programs rely upon that guarantee in order to
parse some data structures. If I could no longer rely upon POSIX, that
would be the most difficult part to fix.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,091
Messages
2,570,604
Members
47,223
Latest member
smithjens316

Latest Threads

Top