bad alloc

J

James Kanze

On Sep 3, 5:37 pm, Dombo <[email protected]> wrote:
If we are all to be perfectly honest about C++ for embeded systems.
Does anyone actually use C++?

It depends on what you mean by "embedded". I've certainly used
C++ in embedded systems, and it is my impression that for large
scale embedded systems (telephone routing, air traffic control,
etc.), it is widespread, with Ada being the only other language
used.
I did some searching on the web to find
out what compilers exist for EC++ and there are a few.

Many embedded systems use standard C++, not EC++.
I don't know if
any of them support exceptions because EC++ has fewer features and one
of the features not featured in EC++ is exceptions.

In practice, the EC++ compilers are often C++ compilers as well,
with various compiler options to choose which C++ features are
active (and added to EC++).

"Embedded" covers such a large domain that it is difficult to
speak about it generally. I've worked on locomotive brake
systems that fit in 2K of PROM, without any disk or OS, and I've
worked on telephone routing systems with more than a million
lines of code, running under Unix. Both are "embedded systems",
but the constraints are radically different.
 
J

James Kanze

I agree this sounds nice in theory, but in current practice this
doesn't work out from what I understand IMHO. I have made a post or
two about it else-thread. You have no guarantee that the "misbehaving
process" no the "process which is doing a complex LDAP thing" is the
one that is going to get the out of memory malloc return failure code
NULL.

No. The problem isn't trivial, and it's quite possible that the
problem which gets the error return is someone else's editor,
and not the LDAP process. Still, a process which gets an error
can take some appropriate action (an editor might spill to disk,
for example). A process which doesn't get an error can't.
Another process, like an important system process, may also try
right then to allocate memory, and thus fail, which is bad for the
entire OS and all processes running.

The important thing is that the process knows that the
allocation has failed, and takes appropriate action.
It's the same problem. An abusive
process can cause an OOM killer on another process with overcommit on,
and that same abusive process can cause a malloc failure in another
process with overcommit off.

Or with overcommit on. DOS is a classical attack strategy, and
the only thing overcommit does in this regard is make it
impossible for the attacked services to recognize the problem
and react to it.
For most processes which are not the ones
being "abusive" but merely innocent bystanders, including system
processes, I suspect they will behave similarly. With overcommit on,
they will be killed with great prejudice. With overcommit off, when
they get the malloc error, most will respond just the same and die a
quick death.

That would be a very poorly written service that died just
because of a malloc failure.
To cut off a pre-emptive argument, I don't think it would work in
practice to say "Oh, critical components need to pre-allocate memory",
as that is unreasonable and will not actually happen. We need a
different solution.

All services, critical or not, should have some sort of
reasonable response to all possible error conditions.
Insufficient memory isn't any different in this respect to any
one of a number of other error conditions, like disk full.
PS: The obvious solution to me appears to be per-user virtual memory
limits, but I'm not sure if that would actually solve anything in
practice. I need more information and more time to consider.

All of the Unices I know do implement per-process limits. Which
can be useful in specific cases as well.
 
J

James Kanze

Not particularly. Our goal here is to never fail. It's by
definition impossible.

Logging is not normally part of the "essential" job of the
application.
Writing my own streambuf is more work than I ever want to do, and
really more work than anyone should ever have to do, in order to
handle an exception.

I've yet to see a logging system in a large system which didn't
use custom streambuf. (For that matter, at least on large scale
servers, I'd say that outputting to custom streambuf's is far
more frequent than outputting to the standard streambuf's.)
Which was my whole point initially. C++ doesn't
provide the facilities out of the box for doing anything other than
some sort of termination on an OOM condition.

I'm not sure what you mean by "the facilities out of the box".
No language that I know provides the logging facilities
necessary for a large scale server "out of the box"; C++
actually does far better than most in this regard. And no
language, to my knowledge, provides any sort of transaction
management (e.g. with rollback).
Yes, but that's not the only behavior that was advocated in an OOM
condition. Attempting to save program state was advocated, and that
may well require converting state.

Or may not, if it is designed correctly. (This is one of the
rare cases where just dumping the memory image of certain
struct's might be appropriate.)
In fact, in context, it started as discussion about trying to
save after std::bad_alloc was thrown, not just merely log the
OOM condition.

I've not followed the discussion completely, but most of what
I've seen seems to concern returning an error for the request
triggering the problem, and continuing to handle other requests.
It doesn't mean you can, either.

Unless the reason you've run out of memory is a memory leak,
you can.
And it hardly justifies the effort involved in handling the
OOM condition.

The effort isn't that hard, and it's a basic requirement for
some applications. If you don't handle OOM correctly, you're
application isn't correct.
I'm not sure how what you wrote has anything whatsoever to do with
what I said.

I'm not sure either. But it is something that must be kept in
mind when you are writing an application which has to handle
OOM.
 
J

James Kanze

On 09/ 2/11 04:37 PM, Adam Skutt wrote:
[...]
I agree. On a descent hosted environment, memory exhaustion is usually
down to either a system wide problem, or a programming error.

Or an overly complex client request.
If you require an application to be "robust" then you can use
an external entity to manage and if necessary, restart it.
This is common practice for system processes.

That's the correct response to a programming error or a system
wide problem. For a request which is part of a DOS attack, it
seems to be playing into the attacker's hand, and for a simply
unreasonable client request, it does seem unnecessary to abort
all other client connections. What's wrong with just responding
with an "insufficient resources" error?

[...]
Indeed, robustness goes beyond a single application.

That's why robust servers run on dedicated machines (where they
are the only "application").
 
I

Ian Collins

On 09/ 2/11 04:37 PM, Adam Skutt wrote:
[...]
I agree. On a descent hosted environment, memory exhaustion is usually
down to either a system wide problem, or a programming error.

Or an overly complex client request.

Not spotting those is a programming (or specification) error!
That's the correct response to a programming error or a system
wide problem. For a request which is part of a DOS attack, it
seems to be playing into the attacker's hand, and for a simply
unreasonable client request, it does seem unnecessary to abort
all other client connections. What's wrong with just responding
with an "insufficient resources" error?

No, it doesn't. Which is why it's not a good idea to allocate an
arbitrarily large chunk of memory in response to an external request.
Where overly complex requests are easily to discard, the more insidious
DOS attacks are those that make many, apparently reasonable, requests.
To protect against those, finer resource limits (such as a finite set on
input buffers) than the size of the heap should be used.
[...]
Indeed, robustness goes beyond a single application.

That's why robust servers run on dedicated machines (where they
are the only "application").

Indeed. That's why I run mine in their own container, with dedicated
system resources. Dedicated machines are so last century!
 
I

Ian Collins

It depends on what you mean by "embedded". I've certainly used
C++ in embedded systems, and it is my impression that for large
scale embedded systems (telephone routing, air traffic control,
etc.), it is widespread, with Ada being the only other language
used.

That's my impression as well. C is still the dominant language for
embedded systems. Even where C is still used, the teams I have worked
with are keen to step up to C++.
Many embedded systems use standard C++, not EC++.

That's right, EC++ was an abomination that should never have been let
loose on the world.
In practice, the EC++ compilers are often C++ compilers as well,
with various compiler options to choose which C++ features are
active (and added to EC++).

"Embedded" covers such a large domain that it is difficult to
speak about it generally. I've worked on locomotive brake
systems that fit in 2K of PROM, without any disk or OS, and I've
worked on telephone routing systems with more than a million
lines of code, running under Unix. Both are "embedded systems",
but the constraints are radically different.

Ditto, my past "Embedded" projects range from single chip 8 bit micros
through to large layer 3 switches, which run as a Linux cluster! On
bigger systems, just like traditional servers, more than one programming
language is used.
 
D

Dombo

Op 04-Sep-11 0:37, James Kanze schreef:
On Sep 1, 8:49 am, Nick Keighley<[email protected]>
wrote:
[...]
How does this address the question...What is the point of a throw if
[it's] not being caught?
to invoke destructors. Go and look up RAII.
If you don't catch the exception, it's unspecified (or
implementation defined, I forget which) whether destructors are
called or not.
ah thanks. I didn't know that. I usually put a catch(...) in main()
which then reports an unknown exception and exits.

Doesn't everybody:). In production code, of course.
Nope.

The reason why this is unspecified is so that if the exception is a
real error, the implementation can generate a core dump which
shows where it was thrown.

That is indeed a good reason not to use catch(...). One of the projects
I have been involved in uses a platform specific feature to trap
uncaught exceptions. If this happens the point in the code where the
exception was thrown and a stack trace is stored along with any
information in the logging buffers. This makes it very easy to analyze
application crashes; if with the crash dump file it takes about a second
to jump to the line where the exception was thrown and walk down the
stack. Before this mechanism was in place the catch(...) approach was
used. The big disadvantage of this approach was that developers had very
little to go on when things did go wrong. The message 'unexpected
exception occurred' is of little use if you don't have a clue what was
thrown from where in a couple of million lines.
 
P

Paul

It depends on what you mean by "embedded".  I've certainly used
C++ in embedded systems, and it is my impression that for large
scale embedded systems (telephone routing, air traffic control,
etc.), it is widespread, with Ada being the only other language
used.


Many embedded systems use standard C++, not EC++.


In practice, the EC++ compilers are often C++ compilers as well,
with various compiler options to choose which C++ features are
active (and added to EC++).

"Embedded" covers such a large domain that it is difficult to
speak about it generally.  I've worked on locomotive brake
systems that fit in 2K of PROM, without any disk or OS, and I've
worked on telephone routing systems with more than a million
lines of code, running under Unix.  Both are "embedded systems",
but the constraints are radically different.
Yes you are right. I tend to think of "embedded" as a small piece of
code in ROM that does some simple logic. But in reality the embedded
world contains much more complex systems such as mobile phones, games
consoles , DVD recorders, automotive electronics etc etc.
 
J

James Kanze

On 09/ 4/11 11:20 AM, James Kanze wrote:
On 09/ 2/11 04:37 PM, Adam Skutt wrote:
[...]
I agree. On a descent hosted environment, memory exhaustion is usually
down to either a system wide problem, or a programming error.
Or an overly complex client request.
Not spotting those is a programming (or specification) error!

And the way you spot them is by catching bad_alloc:).

Seriously, the problem is very much like that of a compiler.
Nest parentheses too deep, and the compiler will run out of
memory. There are two solutions: specify an artificial nesting
limit, which you know you can handle (regardless of how many
connections are active, etc.), or react when you run out of
resources. There are valid arguments for both solutions, and
I've used both, in different applications.
 
A

Adam Skutt

It depends on what you mean by "embedded".  I've certainly used
C++ in embedded systems, and it is my impression that for large
scale embedded systems (telephone routing, air traffic control,
etc.), it is widespread, with Ada being the only other language
used.

Not hardly. C, Erlang, Assembly, MATLAB/Simulink , and many others
are quite common. A lot of embedded code is machine generated too
(usually through translation to another language before compilation),
which makes the compilation language rather irrelevant.

Adam
 
A

Adam Skutt

That would be a very poorly written service that died just
because of a malloc failure.

Then are you willing to make the claim most Linux, UNIX, OS X and
Windows services are poorly written? Because that's what you just
said, merely using different words.
All services, critical or not, should have some sort of
reasonable response to all possible error conditions.

Crashing is a perfectly reasonable response to an error condition,
really most error conditions. I'm not sure why anyone would think
otherwise even for a second.
All of the Unices I know do implement per-process limits.  Which
can be useful in specific cases as well.

You only know a few then. POSIX specifies some crude (and useless)
per-/user/ limits, but per-process limits are not standard by any
stretch of the imagine. As I already explained, they also do nothing
to solve the problem, nor can they really.

Adam
 
A

Adam Skutt

Logging is not normally part of the "essential" job of the
application.

If your goal is to improve robustness by logging the OOM condition,
then it's not just essential, it's mandatory. If you fail to do it,
you failed to improve robustness.
I've yet to see a logging system in a large system which didn't
use custom streambuf.  (For that matter, at least on large scale
servers, I'd say that outputting to custom streambuf's is far
more frequent than outputting to the standard streambuf's.)

I'm not sure what you mean by "the facilities out of the box".
No language that I know provides the logging facilities
necessary for a large scale server "out of the box"; C++
actually does far better than most in this regard.  And no
language, to my knowledge, provides any sort of transaction
management (e.g. with rollback).

Then you don't know many languages! Java, Python, and many others
provide robust, enterprise-grade logging facilities out of the box.
Haskell, Erlang, and many others provide all sorts of transactional
facilities, depending on exactly what you want, out of the box.
Or may not, if it is designed correctly.  (This is one of the
rare cases where just dumping the memory image of certain
struct's might be appropriate.)

You're just making the cost/value proposition worse, not better.
That's even more code I have to write!
I've not followed the discussion completely, but most of what
I've seen seems to concern returning an error for the request
triggering the problem, and continuing to handle other requests.

No, that's not even how the discussion started. One of the major
advocates for handling OOM suggested this was not only possible, but
trivial.
Unless the reason you've run out of memory is a memory leak,
you can.

Nope. All the other requests can die while you're trying to handle
the OOM condition. Or the other side could drop the request because
they got tired of waiting. The reality of the matter is that both
will happen.
The effort isn't that hard,

Yes, it is. It requires me to rewrite a considerable number of
language and sometimes even OS facilities, something you have admitted
yourself! The entire reason I'm using a programming language is
because it provides useful facilities for me As a result, it isn't
the least bit unreasonable to conclude that rewriting language
facilities is hard. If I wanted to be doing to be writing language
facilities, then I'd just write my own damn programming language in
the first place!
and it's a basic requirement for
some applications.  If you don't handle OOM correctly, you're
application isn't correct.
Applications that require a response to OOM other than terminate are
an unsubstantial minority. Systems that cannot permit termination as
an OOM response are almost certainly broken.
I'm not sure either.  But it is something that must be kept in
mind when you are writing an application which has to handle
OOM.

And it makes justifying handling OOM only harder, not easier! You're
making my case for me!

Adam
 
A

Adam Skutt

That's the correct response to a programming error or a system
wide problem.  For a request which is part of a DOS attack, it
seems to be playing into the attacker's hand, and for a simply
unreasonable client request, it does seem unnecessary to abort
all other client connections.

One, it's not a given that it aborts other client connections, after
all. There could be a higher level mechanism that provides the
illusion of a persistent connection even after failover. Second, it
may be unnecessary, but just because it's unnecessary it doesn't
follow that:
1) The value of trying to handle OOM, instead of terminating, exceeds
it cost.
2) OOM can be handled in a robust fashion.
 What's wrong with just responding
with an "insufficient resources" error?

That's not enough to get what you want, which is isolation of the
failure.
That's why robust servers run on dedicated machines (where they
are the only "application").

No, that's not a requirement for robustness. Frequently, it makes the
situation worse. Trivial example: separating your Kerberos5 and LDAP
servers rarely does anything for robustness. Especially if you stored
your K5 database in LDAP.

Adam
 
A

Adam Skutt

On 09/ 2/11 04:37 PM, Adam Skutt wrote:
     [...]
I agree.  On a descent hosted environment, memory exhaustion is usually
down to either a system wide problem, or a programming error.
Or an overly complex client request.
Not spotting those is a programming (or specification) error!

And the way you spot them is by catching bad_alloc:).

No, you set upfront bounds on allowable inputs. This is what other
engineering disciplines do, so I'm not sure why computer programmers
would do something different. Algorithms that permit bounded response
to unbounded input are pretty rare in the grand scheme of things.
Even when they exist, they may carry tradeoffs that make them
undesirable or unsuitable (e.g., internal vs. external sort).
Seriously, the problem is very much like that of a compiler.
Nest parentheses too deep, and the compiler will run out of
memory.  There are two solutions: specify an artificial nesting
limit, which you know you can handle (regardless of how many
connections are active, etc.), or react when you run out of
resources.  There are valid arguments for both solutions, and
I've used both, in different applications.

The problem is reacting in a way that isolates the failure. Most
language and operating systems make that very difficult.

Adam
 
A

Adam Skutt

there would be one emergency, no computer, manual way to pilot

Plenty of aircraft are fly-by-wire and require the computer in order
to fly. Such aircraft are designed with enough redundancy so that if
any one computer fails, the aircraft can still fly. Very thorough
analysis is done to ensure that another failure cannot occur while the
failed unit is rebooting. Sufficient redundancy is added to ensure
this is the case.

Adam
 
A

Adam Skutt

You only know a few then.  POSIX specifies some crude (and useless)
per-/user/ limits, but per-process limits are not standard by any
stretch of the imagine.  As I already explained, they also do nothing
to solve the problem, nor can they really.

I should clarify this part. All Unicies provide some limits through
ulimit, which provides mostly per-process limits that are user
discretionary. Few provide the ability to enforce limits on all
invocations of binary X, which is what I assumed you were talking
about and what would be necessary. ulimit isn't helpful because the
limits have to be high enough for the largest program the user wishes
to run, and you cannot prevent the user from setting those limits for
running other binaries.

Adam
 
I

Ian Collins

On 09/ 4/11 11:20 AM, James Kanze wrote:
On 09/ 2/11 04:37 PM, Adam Skutt wrote:
[...]
I agree. On a descent hosted environment, memory exhaustion is usually
down to either a system wide problem, or a programming error.
Or an overly complex client request.
Not spotting those is a programming (or specification) error!

And the way you spot them is by catching bad_alloc:).

Seriously, the problem is very much like that of a compiler.
Nest parentheses too deep, and the compiler will run out of
memory. There are two solutions: specify an artificial nesting
limit, which you know you can handle (regardless of how many
connections are active, etc.), or react when you run out of
resources. There are valid arguments for both solutions, and
I've used both, in different applications.

I have also seen both in compilers. For example last time I played, g++
didn't have a recursion limit for templates (as used in
meta-programming) while Sun CC does.
 
M

Miles Bader

Ian Collins said:
I have also seen both in compilers. For example last time I played, g++
didn't have a recursion limit for templates (as used in
meta-programming) while Sun CC does.

[It does now tho...]

-miles
 
G

Goran

On 09/ 4/11 11:20 AM, James Kanze wrote:
On 09/ 2/11 04:37 PM, Adam Skutt wrote:
     [...]
I agree.  On a descent hosted environment, memory exhaustion is usually
down to either a system wide problem, or a programming error.
Or an overly complex client request.
Not spotting those is a programming (or specification) error!
And the way you spot them is by catching bad_alloc:).

No, you set upfront bounds on allowable inputs.  This is what other
engineering disciplines do, so I'm not sure why computer programmers
would do something different.

Because code runs in a more volatile environment and tends to handle
more complex (models of) systems.

An obvious example: a program operates on a set of X-es in one part,
and on a set of Y-s in another. Both are being "added" to operation as
user goes along. Given system limits, code can operate in a range on A
X-es and 0 Y-s, or 0 X-es and B Y-s, and any many-a-combination in
between. Whichever way you decide on a limit on max count of X or Y,
some use will suffer. Compound this with the empirical observation
that, beside X and Y, there's U, V, W and many more, and there you
have it. Add a sprinkle of a volatile environment, as well as
differing environments, because one code base might run in all sorts
of them...

A simple answer to this is to (strive to ;-)) handle OOM gracefully.

Goran.
 
A

Adam Skutt

On 09/ 4/11 11:20 AM, James Kanze wrote:
On 09/ 2/11 04:37 PM, Adam Skutt wrote:
     [...]
I agree.  On a descent hosted environment, memory exhaustion is usually
down to either a system wide problem, or a programming error.
Or an overly complex client request.
Not spotting those is a programming (or specification) error!
And the way you spot them is by catching bad_alloc:).
No, you set upfront bounds on allowable inputs.  This is what other
engineering disciplines do, so I'm not sure why computer programmers
would do something different.

Because code runs in a more volatile environment

It does nothing of the sort! Code does not have to deal the physical
environment: it doesn't have to be concerned with the external
temperature, humidity, shock, weather conditions, etc. It does not
care whether the computer has been placed in a server room or outside
in the middle of the desert. Hardware frequently has to care about
all of these factors, and many more. Operating systems provide
reasonable levels of isolation between components: software can
generally ignore the other software running on the same computer.
Hardware design has to frequently care about these factors: the mere
placement of ICs on a board can cause them to interfere with one
another!

The list goes on and on, and applies to all of the other engineering
disciplines too. This is easily by far both the most absurd and most
ignorant thing you've said yet, by a mile.
and tends to handle
more complex (models of) systems.
Because it's cheaper and easier to do such things in software, in no
small part because many of the classical design considerations for
hardware simply disappear. However, that doesn't change my statement
on setting bounds in the least.
An obvious example: a program operates on a set of X-es in one part,
and on a set of Y-s in another. Both are being "added" to operation as
user goes along. Given system limits, code can operate in a range on A
X-es and 0 Y-s, or 0 X-es and B Y-s, and any many-a-combination in
between. Whichever way you decide on a limit on max count of X or Y,
some use will suffer.

I'm not sure how you think this is irrelevant to this portion of the
discussion, but it's not even true. The limit for both may be
excessively generous for any reasonable use case. Moreover, plenty of
hardware has to process two distinct inputs and still sets bounds, so
it is an accepted technique.
Compound this with the empirical observation
that, beside X and Y, there's U, V, W and many more, and there you
have it.

There is no such empirical observation. That windmill in front of you
is not a dragon.
Add a sprinkle of a volatile environment, as well as
differing environments, because one code base might run in all sorts
of them...

A simple answer to this is to (strive to ;-)) handle OOM gracefully.
Even if anything you'd just wrote were true, you're still making the
case for handling OOM by termination in reality. If the environment
were really as diverse and volatile as you claim, and I can't prevent
the condition by setting reasonable bounds, there's really no reason
to believe I can respond to the condition after the fact, either.

Adam
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,141
Messages
2,570,817
Members
47,362
Latest member
ChandaWagn

Latest Threads

Top