bad alloc

N

Nick Keighley

On Aug 31, 12:28 am, Paul <[email protected]> wrote:

    [...]
How does this address the question...What is the point of a throw if
[it's] not being caught?
to invoke destructors. Go and look up RAII.

If you don't catch the exception, it's unspecified (or
implementation defined, I forget which) whether destructors are
called or not.

ah thanks. I didn't know that. I usually put a catch(...) in main()
which then reports an unknown exception and exits.
 
A

Adam Skutt

Which is almost useless for basically anything. Any user or job is
going to have a small number of very large processes, and a bunch of
small processes. Setting a per-process limit and a number-of-process
limit doesn't work.

It is indeed useless, but not for the reasons you mention. It's
useless for the reason I mentioned: you have to get your limits to
precisely match the commit limit, which is very hard to do. Harder
still given that not all virtual pages count against your commit
charge.
You could reserve some commit for the kernel, enough so that it won't
die on OOM. That's a solution, maybe.

No, that's not really viable nor practical either, the kernel simply
has too many per-process resources. Trying to reserve them all for
many systems would exhaust the kernel's VAS or exhaust PAS. Since
many kernel pages are wired and aren't subject to page out, the kernel
must strike a careful balance between reserving pages for these corner
cases and not hogging all of physical memory.

Plus, there are a lot of corner cases when you reach this situation
that aren't always well tested. You may end up tripping a bug or two.
Besides, anything that involves being able to hand out anonymous pages
must fail. On top of all this, your system is likely grinding to a
halt under the paging load. So it's not a given that keeping the
system up is desirable anyway. Which is better: a brief dropout to
reboot or staying up and processing everything extremely slowly?
I won't accept right now the
claim that there's nothing you can do to stop a misbehaving user
process from killing vital protected system processes.

Modern Linux lets you prioritize and exempt processes from OOM. It
still doesn't really help anything from a systems perspective and can
definitely lead to panic situations, especially if your solution is,
"protect all the processes!".

But no, at the end of the day, there's nothing you can do to stop
misbehaving user processes from consuming enough resources (including
but not exclusive to memory) to cause vital system processes,
including the kernel itself, to die.
It depends on the goals. Here you might be right. To take a silly
extreme, threading is an example where you have to go to "non
portable" functions to get it to work, but it's definitely worth it,
and people do it all the time.

I have libraries to make threading easier than dealing with the low-
level APIs (in theory, at anyrate). I don't really have such a thing
for OOM conditions. The situations are not the least bit equivalent
at all from any perspective.
I just disagree with your blanket
assertion that ignores the cost benefit analysis that must go into any
reasonable coding decision.

My assertions do nothing of the sort. The reality of the matter is
that cost benefit analysis is extremely heavily tilted towards do
nothing, whether you (and many others) like it or not. So much so
that many times it's even pointless to bother with the cost-benefit
analysis.

Adam
 
P

Paul

by using the term "application" you are implicitly talking about user
mode development.

Do kernals really call new? Do they throw C++ exceptions?

Dude.
An application uses kernal mode modules device drivers etc.
Application is just a layer.
I thought you were talkign about "applciations"? In fact the
discussion was about applications on phones. Don't sound very kernally
to me.
No he started talking about reserving memory for the phones critical
systems, then I said what is the point is another program is running
alongside it which may crash the system.

SO what if that program is an "applicaton" or any other type of
program , an application is capable of crashing a system just as a
device driver is, because applications use device drivers.

in a well written OS a crash in a user mode program will not harm the
OS. In this case we are talkign about an exit due to an unhandled
exception. From an OS's point of view this pretty well defined
behaviour. Likely the user program called abort() which is pretty well
a "kill me now!" request.

What exactly is....... an exit due to unhandled exception?

THe reason from the program exiting is not because of an unhandled
exception. An unhandled exception allows the program to run on, the
progam doesn't exit until it crashes somewhere down the line after
corrupting the whole systems memory and shit.

Do your kernal programs call abort()? What happens then?- Hide quoted text -
Define abort()
 
N

Nick Keighley

Does not neglect the hardware!  

rather my point...

so not everyone neglects the hardware do they?

Plus, the wikipedia page says you can
write programs for NonStop OS that terminate.

lost me. Surely most programs (except maybe OSs) are expected to
terminate sometime.

 Please read your
references before citing them.


I did. I think you misunderstood my pojnt
 NonStop just provides a defined
mechanism, as part of the operating system, for providing the sorts of
redundancy I mentioned you need.
riight...


The recent issues with Amazon EC2 would suggest yes.


All support software that crashes.

not too often one would hope... A crash in a pace maker sounds like a
bad idea.
 In life/safety-critical systems,
it's often preferable to crash immediately (fail-fast) whenever /any/
abnormal situation is encountered, because you can restart to a known
state much faster than you can fix the program, and you get a
deterministic response to all crashes.

I tend to take this approach on important stuff and the hardware is
often duplicated
 
P

Paul

On Sep 1, 8:49 am, Nick Keighley <[email protected]>
wrote:
On 30/08/2011 14:54, Paul wrote:
On 30/08/2011 14:26, Paul wrote:
<snip>
I think that pre-STL it was pretty much standard practise to check
for memory allocation failures for example:
float* m1 = new float[16];
if(!m1){
//output an exit msg.
exit();
}
which is exactly what not handling a bad_alloc exception would do...
<snip>
Rule of thumb for C++ exception aware code: liberally sprinklethrows
but have very few try/catches.
What is the point in having throws if you're not going to catchthem?
Again your lack of experience is showing.
How does this address the question...What is the point of a throw if
[it's] not being caught?
to invoke destructors. Go and look up RAII.
And what good would invoking any destructor do?
Your program is about to crash you want to call destructors but not
catch the exception? That doesn't make any sense.- Hide quoted text -

depends what the destructors do. They could close log files, release
databases etc. etc.- Hide quoted text -
All the more reason to catch the exception and not allow your program
to carry on using(or attempting to use) resources which have been
destroyed.
 
P

Paul

That's precisely what you suggested when you claimed that all we had
to do is rollback down the stack!


Then you need to define these "more sophisticated schemes", and define
such a scheme that's actually worth it to implement most of the time.



Yes, there is.  You can set the maximize size of the VAS per-process,
and the maximum number of processes, which provides a hard upper
bound.  If you want finer grained controls, you have to patch the
Linux kernel; there are various patches out there that can accomplish
what you want.  Some other UNIX systems (e.g., Solaris) provide finer
grain controls out of the box.

There are other ways, such as containers and virtualization, to
accomplish similar feats.  They're rarely worth it.


Compared to what alternative?  The traditional alternative is a kernel
panic, and that's not necessarily better (nor worse).  For better or
ill, we've become accustomed to the assumption we can treat memory as
an endless resource.  Most of the time, that assumption works out
pretty well.  When it falls apart, it's not shocking the resulting
consequences are pretty terrible.


Because per-user limits don't fix the problem, unless you limit every
user on the system in such a fashion to never exceed your commit
limit.  Even then, you're still not promised the "memory hog" is the
one that's going to be told no-more memory. That's why we don't
bother: identifying the "memory hog" is much too hard for a
computer.

It may well be a broken situation, but there's also not a good
solution.




Yes, plenty of people here seem to be convinced handling OOM (by not
crashing) is a requirement for writing "robust" software; that means
being able to perform some sort of action (such as logging) and
continue onward after the OOM condition.  However, some people here
also seem to believe that performing I/O doesn't effect program state,
so their requirements are probably not worth fulfilling.

Still, this plays into my larger point: even if you have a situation
where you can respond to an OOM condition in some meaningful fashion
other than termination, you're still not assured of success.  Put more
plainly, handling OOM doesn't ipso facto ensure additional
robustness.  You need to be able to handle the OOM condition and have
a reasonable assurance that your response will actually succeed.  It
may be an overstatement on my part to say, "Succeed no matter what",
but making the software more "robust" certainly means tending closer
to that extreme than the opposite.

As a concrete example, writing all this OOM handling code does me no
good if when my process finally hits a OOM condition, my whole
computer is going to die anyway.      For many applications, this is
one of two reasons why they'll ever see an OOM condition.

Of course, the reality is that all of this rarely makes software more
robust, because actual robust systems generally are pretty tolerant of
things like program termination.  As I've said before, frequently it's
even preferable to terminate, even when it would be possible to
recover from the error.   This makes the value proposition of handling
OOM even less worthwhile.
Here you are talking about handling a low memory condition by possibly
terminating. This is only possible if you catch the exception, or use
some other error checking.
You are talking as if not catching the exception is the same as
terminating the program, which ofc is completely incorrect. Not
catching an exception doesn't automatically terminate a program.
 
P

Paul

Why do you insist on making false assertions based on gaps in your
knowledge?

 From n3242:

15.3/9

If no matching handler is found, the function std::terminate() is
called; whether or not the stack is
unwound before this call to std::terminate() is implementation-defined
(15.5.1).
Oh well I have to admit I didn't know the program was terminated. If
any uncaught exception will cause the program to terminate prematurely
then surely there is still a strong case for catching exceptions.
What happens if you have multiple processes. Are these other
processes, and all their resources also properly cleaned up?
 
N

none

I was talking about fragmentation here.

You may very well have been trying to talk about fragmentation.
However, you stated (and I quote directly): "1GB RAM with 500MG free,
500MB used." Virtual memory and RAM are two totally different things
and on a modern OS, you will never get an allocation failure because
you ran out of RAM.
 
P

Paul

Paul   said:
You may very well have been trying to talk about fragmentation.
However, you stated (and I quote directly): "1GB RAM with 500MG free,
500MB used."  Virtual memory and RAM are two totally different things
and on a modern OS, you will never get an allocation failure because
you ran out of RAM.- Hide quoted text -
Hello I understand what you are saying about RAM and Virtual memory
being two different things. I was using the term RAM as described:

"Many computer systems have a memory hierarchy consisting of CPU
registers, on-die SRAM caches, external caches, DRAM, paging systems,
and virtual memory or swap space on a hard drive. This entire pool of
memory may be referred to as "RAM" by many developers, even though the
various subsystems can have very different access times, violating the
original concept behind the random access term in RAM."

ref:http://en.wikipedia.org/wiki/RAM

Without getting into all the details of any particular memory model, I
was generalising about memory fragmentation but I understand it is
slighly more complicated with Virtual Address Spaces.
 
N

none

If it's possible to stop processing a job safely and recover, sure.
Marking the job as an error might require allocating memory, so you
have to avoid that. This is more difficult that it sounds, and
requires code that is explicitly written to ensure that it is
possible. Screw that code up, and you'll almost certainly end up in a
deadlock situation or with a zombie worker thread, at which point you
will be restarting the process anyway!

I can easily come up with examples that demonstrate that for this
particular case, terminating on memory failure is the best situation.

Unfortunately, it seems to me that some peoples seems to advocate "you
must always terminate on any allocation failure on any situation and
this is and will forever be the only valid solution". You post,
unfortunately seems to try to expand my voerly simplified example in
such a way to demontrate that this will never work.

It all depend on context.

Simple example: explode a JPEG image for editing or explode a zip file
in memory to look at the content. Multiple worker thread can all be
doing some processing. When one of the worker try to allocate a large
amount of memory to process the job, the large allocation will fail.

This is a relatively safe situation to recover from because the huge
probability is that what will fail is the single call to do a large
allocation. In this case, what can't be done is allocate a large
block of memory. "common" small block of memory will still be working
fine. So one could safely catch this bad_alloc in the caller and
forgo processing this particular item. It is very unlikely that there
would not be enough memory to log and mark a job as error.

There is a remote possibility that the large alloc fills the memory to
99.99999% and then the next small alloc fails. This is a rare case
but not necessarily a problem as it can treated differently. In the
area of the code where you know that you will be attempting to
allocate a variable large amount of memory, you can consider bad_alloc
as being a recoverable error, if the bad_alloc happens elsewhere
during "common" allocation, then this is a unexpexted error from
unknown causes and it is probably best to give up and terminate.
Anyway, while you're in the process of attempting to cancel the
oversized job, many other jobs will fail (possibly all of them) since
they can no longer allocate memory either. In the meantime, all your I/

Maybe and maybe not. Simply because as described above, what failed was
allocating the *large amount* of memory.

The following code is perfectly safe:

try
{
int *p1 = new int[aMuchTooLargeNumber];
}
catch(std::bad_alloc &)
{
// OK to ignore
// I can even safely use memory to write logs
}
int *p2 = new int[10];
// Don't catch bad_alloc here because in this case
// this would be a really unexpected error

In fact since the heap allocator will be thread safe (using various
methods such as mutex, lock and sub-heaps) there should be no point in
time when one attempting to allocate a "not too large" block
would fail because a different thread is currently attempting to
allocate "a much too large" block.

(obviously, it is possible that a block size that would not be too
large in a single thread would not be possible to allocate if all the
worker thread all attempted to process such a "large-ish but not too
large" job at the same time.

Yannick
 
A

Adam Skutt

I can easily come up with examples that demonstrate that for this
particular case, terminating on memory failure is the best situation.  

Unfortunately, it seems to me that some peoples seems to advocate "you
must always terminate on any allocation failure on any situation and
this is and will forever be the only valid solution".

No one has done that. Several have argued the opposite: that never
failing (or as close as possible) is the correct solution for OOM,
however.
 You post,
unfortunately seems to try to expand my voerly simplified example in
such a way to demontrate that this will never work.

No, it strives to demonstrate that handling OOM through something
other than termination is quite difficult, much more difficult than
your simplified example makes it out to be. Moreover, that even if you
can handle it, you stil; may not achieve your end goal, which was
isolation. Especially when you're running in a threaded environment.
Simple example: explode a JPEG image for editing or explode a zip file
in memory to look at the content. Multiple worker thread can all be
doing some processing. When one of the worker try to allocate a large
amount of memory to process the job, the large allocation will fail.
This is a relatively safe situation to recover from because the huge
probability is that what will fail is the single call to do a large
allocation.

No, I have no reason to believe your probability estimates. Besides,
who said there's necessarily going to be a single huge call anyway?
It's a bad premise, it's an even worse conclusion. Neither of your
two examples ipso facto require large amounts of memory allocation.
It depends entirely on what you're doing.

What will fail depends heavily on the sequence of operations and the
behavior of your allocators. It's not even deterministic. Plus, you
need to give a general purpose definition for a "huge" block. Good
luck with that.
Maybe and maybe not.  Simply because as described above, what failed was
allocating the *large amount* of memory.  

You don't know that. You cannot assume that the allocation failed
merely because it was large when writing an exception handler that
will reliably respond to an OOM condition, and when trying to make a
program more robust under low memory situations. You may be in a
state where all allocations are going to fail.
The following code is perfectly safe:

try
{
   int *p1 = new int[aMuchTooLargeNumber];}

catch(std::bad_alloc &)
{
   // OK to ignore
   // I can even safely use memory to write logs}

To be explicitly clear, the assumption in the catch block is wrong.
In general, you don't know what 'aMuchTooLargeNumber' is at coding
time, compile time, nor run time. More importantly, it's generally
impossible to find out.

That's the problem with your reasoning. You can't decide to handle
std::bad_alloc or not based on the size of the allocation. You can
decide to do it based on the operation the program was attempting to
undertake, which is what you're actually trying to do.

Once you've done that, you still need to explain a handler that
achives your goal of isolation. Then you need to demonstrate it's
less work than doing something like running the workers in separate
processes instead of threads. Otherwise, you don't have a winning
solution.
In fact since the heap allocator will be thread safe (using various
methods such as mutex, lock and sub-heaps) there should be no point in
time when one attempting to allocate a "not too large" block
would fail because a different thread is currently attempting to
allocate "a much too large" block.

No, this is also incorrect, because it's predicated on the assumption
that you failed because the allocation was simply too large vs. the
other allocations your program will make.

Adam
 
P

Paul

No one has done that.  Several have argued the opposite: that never
failing (or as close as possible) is the correct solution for OOM,
however.
There seems to be two arguments based on what is the best thing to do
in *most* situations. You are correct to say that nobody is suggesting
there is one answer for all situations.

No, it strives to demonstrate that handling OOM through something
other than termination is quite difficult, much more difficult than
your simplified example makes it out to be. Moreover, that even if you
can handle it, you stil; may not achieve your end goal, which was
isolation.  Especially when you're running in a threaded environment.

Handling the exception may not always be a difficult thing to
achieve.
Yes it will probably involve some extra work and effort to design and
implement, and the question is .. is it worth it?
This probably depends if the OOM situation is an unusual peak in
memory usage or if the program is running on a system that does not
meet the memory requirements.
If the latter is the case then its probably best to just terminate but
if it is an unusually high peak in memory usage then I think its best
to try to recover, if possible.

No, I have no reason to believe your probability estimates.  Besides,
who said there's necessarily going to be a single huge call anyway?
It's a bad premise, it's an even worse conclusion.  Neither of your
two examples ipso facto require large amounts of memory allocation.
It depends entirely on what you're doing.
Why do you disagree with the probaility? It seems obvious that a large
allocation has a higher probability to fail.

<snip>
 
D

Dombo

Op 02-Sep-11 13:18, Nick Keighley schreef:
lost me. Surely most programs (except maybe OSs) are expected to
terminate sometime.

On embedded systems it is quite likely that the program is intended to
run for eternity (or at least until the power is lost).
 
P

Paul

Op 02-Sep-11 13:18, Nick Keighley schreef:



On embedded systems it is quite likely that the program is intended to
run for eternity (or at least until the power is lost).

If we are all to be perfectly honest about C++ for embeded systems.
Does anyone actually use C++? I did some searching on the web to find
out what compilers exist for EC++ and there are a few. I don't know if
any of them support exceptions because EC++ has fewer features and one
of the features not featured in EC++ is exceptions.
 
R

red floyd

If we are all to be perfectly honest about C++ for embeded systems.
Does anyone actually use C++? I did some searching on the web to find
out what compilers exist for EC++ and there are a few. I don't know if
any of them support exceptions because EC++ has fewer features and one
of the features not featured in EC++ is exceptions.

Yes. We had a missile avionics system written in C++.
 
J

James Kanze

On Sep 1, 8:49 am, Nick Keighley <[email protected]>
wrote:
On Aug 31, 12:28 am, Paul <[email protected]> wrote:
[...]
How does this address the question...What is the point of a throw if
[it's] not being caught?
to invoke destructors. Go and look up RAII.
If you don't catch the exception, it's unspecified (or
implementation defined, I forget which) whether destructors are
called or not.
ah thanks. I didn't know that. I usually put a catch(...) in main()
which then reports an unknown exception and exits.

Doesn't everybody:). In production code, of course. The
reason why this is unspecified is so that if the exception is a
real error, the implementation can generate a core dump which
shows where it was thrown. (IMHO, of course, if it is a real
error, an exception is not the appropriate response; I prefer an
assertion failure. But a lot depends on the application.)
 
J

James Kanze

[...]
lost me. Surely most programs (except maybe OSs) are expected to
terminate sometime.

That depends on the application domain. Most of the programs
I've worked on are expected to run 24/7, for years on end. I
suspect that this is the most frequent case: I suspect (just
intuitively---no real statistics to back me up) that most
programs are embedded systems (the ignition or the brakes in
your car, for example), and these typically don't "terminate"
unless the power is removed. (Most of the programs I've worked
on have been large scale servers or network management systems.
I don't think such programs represent the majority of all
programs. But we do have contractual penalties if the program
terminates.)

[...]
not too often one would hope... A crash in a pace maker sounds like a
bad idea.

Any critical system should be designed (globally) so that a
software crash doesn't cause serious dammage. Which doesn't
mean that crashing is necessarily acceptable. (But it is
preferred to continuing operation when the system is corrupt, so
that the back-up solutions can react.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,141
Messages
2,570,817
Members
47,362
Latest member
ChandaWagn

Latest Threads

Top