You are amazing. You post here accusing peoples that actually post
code samples and providing "zero evidence" while you provide
absolutely no evidence of your claims!
I'm responding to this against my better judgement. Don't make me
regret it.
I've provided plenty of evidence, but most of my claims are rather
basic, elementary facts. They don't really need any substantiation,
since they can be trivially verified by using Google and/or some basic
reasoning.
What facts do you not believe aren't basic nor elementary?
* That's there's no generally way to tell why std::bad_alloc was
thrown?
* That the odds of a kernel panic or other seriously bad behavior on
the part of the OS when the commit limit is reached are good?
* That overcommitting operating systems will only give you
std::bad_alloc on virtual address space exhaustion (or overcommit
limit exhaustion)? That overcommitting operating systems may kill
your program when out of memory without ever causing a std::bad_alloc
to occur?
* That operating systems and language runtimes can treat allocations
of different types and sizes differently, so you can see 'unintuitive'
behavior such as large allocations succeeding when little ones fail?
That you can see behavior such as the runtime asking for more memory
from the OS than strictly necessary to fulfill the programmer's
request? That heap fragmentation can cause similar issues? That
merely stack unwinding in respond to an OOM condition doesn't
necessarily ensure other parts of the code can leverage the
deallocated resources?
* That modern language runtimes and libraries, including C++, treat
memory allocation as a common and generally innocuous side-effect?
Meaning that, generally speaking, you cannot assume whether a given
function allocates memory or not?
* That the default behavior when std::bad_alloc is thrown is program
termination in some fashion?
* That most programs aren't failing on OOM because they tried to
allocate a singular array, in a singular allocation that's too large?
(This list isn't exhaustive, mind you)
From all of that, I concluded (my original text):
"Properly written, exception-safe C++ code will do the right thing when
std::bad_alloc is thrown, and most C++ code cannot sensibly handle
std::bad_alloc. As a result, the automatic behavior, which is to let
the exception propagate up to main() and terminate the program there,
is the correct behavior for the overwhelming majority of C++
applications. As programmers, we win anytime the automatic behavior
is the correct behavior. "
I also noted earlier (my original text):
Even when there's memory to free up, writing an exception handler that
actually safely runs under an out-of-memory condition is impressively
difficult. In some situations, it may not be possible to do what you
want, and it may not be possible to do anything at all. Moreover, you
may not have any way to detect these conditions while your code is
running. Your only recourse may be to crash and crash hard.
Later I noted, to you no less (my original text):
This can happen anyway, so your solution must simply be prepared to
deal with this eventuality. Making your code handle memory allocation
failure gracefully does not save you from the cleaning staff. If your
system can handle the cleaning staff, then it can handle memory
allocation failure terminating a process, too.
I don't know why people think it's interesting to talk about super
reliable software but neglect super reliable hardware too. It's
impossible to make hardware that never fails (pesky physics) so why
would I ever bother writing software that never fails? Software that
never crashes is useless if the cleaning staff kicks out the power
cable every night.
Note this is a general-case practical argument based on the economic
cost. There's three thrusts:
* Most code lacks a sensible response to memory allocation failure.
This may be due to the fact that there's just no sensible way to
continue onward, or it may be due to the fact that allocation failure
only occurs in a catastrophic situation.
* Even when a sensible response exists, the difficulty (cost) involved
in writing a sensible response is too hard. This includes all the
code changes necessary to ensure the program can continue onward.
These are not free, nor are they small.
* When a response is necessary, higher-level mechanisms to achieve
reliability, isolation, and/or problem avoidance are typically
superior because they handle other situations that you care about as
well. I'll repeat two: properly redundant and clustered server
instances also avoid power interruptions; smart limits on input values
also defend against malicious inputs.
In short, it's too hard to do and even when it can be done, the cost
doesn't outweigh the benefit. I could go one step further and note
that it's impossible or even more difficult in many common languages
other than C++, so techniques that benefit those languages as well are
clearly of more value than inherently C++ specific techniques.
Unsurprisingly, most, if not all, of what I've suggested one do is
language agnostic.
Since it's a general-case and not absolute argument, specific
counterexamples cannot disprove it. You need to either defeat my
assumptions (e.g., the cost of a handler is too high generally) or
show I'm mistaken in the generality of my claims (i.e., in reality I'm
talking about a niche).
Showing you can provide a sensible response to a singular large array
is not interesting. I write a considerable amount of scientific code
professionally. Such applications frequently use large arrays in the
bulk of their processing. In my applications, none of them use a
single array, there's always two at a minimum: one for the input and
one for output, as much of the stuff I write lacks easy in-place
algorithms. Much of the time there's several inputs being combined
into a single output, so there are many arrays. Many of them need
temporary copies in one state or another, so there's even more arrays
beyond the input and output. But on top of that, my code still has to
load data from disk and write it back out to disk. That involves a
multitude of smaller objects, many of which I don't directly create or
control: I/O buffers, strings to hold individual lines and portions of
lines, state objects for my parsers, application objects holding
command-line parameters and other configuration items, etc.
All of that is involved in performing a single operation. Related to
your original case, almost all of that would be unique to each request
if my application were threaded in a per-request manner. I don't
think the assumption that generally, most applications make many
"extra" allocations in support of their main algorithm is
unreasonable. Nor do I think it's reasonable to assume that
generally, the main algorithm is using some big array or that the
array gets allocated all at once. Moreover, if you disagree with
these generalities, I really have no interest in discussing this with
you at all, as you're too clearly colored by your own perceptions to
have a reasonable discussion.
Let me pose it to you another way: glibc specializes small allocations
by including a small block allocator that's intentionally designed to
be fast for small allocations. If small allocations weren't frequent,
what would be the value of specifically optimizing for them?
Moreover, large allocations are very slow: each one involves a system
call to mmap(2) on request and again to munmap(2) on release. System
calls are very, very slow on modern computers. Likewise, the Python
interpreter intentionally optimizes for the creation of several small
objects. Finally, some generational garbage collectors consider
allocation size when determining generation for the object. So do you
really believe your example is that general purpose, and that
relevant, when so much time is spent on optimizing the small? Do you
really believe all of these people are mistaken for focusing on the
small?
It works perfectly as intended in the scenario it is intended to work.
Bully. I don't know any other way to explain to you that the scenario
that your scenario is uninteresting and not a valid rebuttal. More
importantly, it can't ever be a valid rebuttal to what I've said.
You claim that there is burden of proof upon me. But you
seem totally happy to assume that there is absolutely no burden of
proof whatsoever upon you. You are amazing.
The burden of proof upon me is substantially less since I'm arguing
for the status quo. Simply because you find my statements
controversial doesn't mean they are actually controversial.
The best way to improve robustness of an application is by uing
multiple techniques that are useful for a limited type of error
cases.
No, I don't think so. As I've noted, plenty of systems with safety-
critical requirements take a unilateral action to all abnormal
conditions, and that response is frequently to terminate and restart.
As I've noted, there are plenty of techniques to improve robustness
that cover a bevy of error conditions.
The design I posted does improve the robustness of real applications
under a particular type of failure scenario.
No, it does not, because you haven't proven that failure scenario
actually occurs. The issue is not, "It doesn't work if X happens,"
the issue is, "X 'never' happens". Improving robustness requires
showing that you've actually solved a real problem that occurs
regularly. Thus far, you have not shown it will happen regularly as
you describe. You haven't even shown most applications behave as you
describe.
It never intended nor
tried to solve 100% of all possible failure scenario.
If we go back to the cleaning lady pulling of the plug example, does
this mean that because you can write code to protect against the
cleaning lady, you should not bother to attempt to handle any error
whatsoever?
Yes it absolutely can, from a cost/benefit perspective. Especially
when the error in question is rare. Again, my argument is almost
entirely cost/benefit. If I have to build a system that tolerates
power failure of individual servers, then it also handles individual
processes running out of memory. As such, the benefit of handling out
of memory specially must be quite large to justify further costs, or
the cost must be incredibly cheap.
Why?
Why is it impossible to be able to estimate which allocations are
more likely to fail?
Because _size_ has very little to do with the probability. They're
just not strongly correlated factors, unless you're talking about
allocations that can never succeed.
Is it impossible to be able to know which allocations are likely to be
large and which allocation are likely to be small?
Why is it impossible to design a system in such a way to some
allocations are more likely to be larger than than other?
No and it's not, but it's not enough information to determine anything
useful. Moreover, if you use a structure such as a linked list or a
tree, all of your allocations may be the same size!
Again, you would be saying something interesting if you would stop
talking about a singular allocation. Even using a std::vector means
you have to contend with more than one allocation.
Please tell me what my original claim was?
I never claimed that it is easy and simple to protect against all
failure scenario. You must have imagined that.
No, but you claimed it was easy and simple to isolate failure in one
thread from failure in the others. In your defense, you were probably
talking only about the case you coded in your example, but you hardly
made that clear from the outset. Regardless, it only hold while your
assumptions hold, and as I've said many times, your assumptions are
very, very poor.
I have no issue with your claim that correctly handling all types of
OOM errors is very difficult. I have issue with your refusal to admit
that there are circumstances where OOM errors can be handled.
I have never done the latter at any point. I don't know why you
persist in believing that I have. There are obviously situations in
which it can be done, because it has been done in the past.
This is a lie and you know it. You are attempting to claim that I
claimed a generality when I never claimed such a thing.
I strongly disagree with your generalities. I claim that under
specific circumstances with intelligent design you can handle some OOM
errors.
If you disagree with my generalities then you too must be talking in
generalities in order to have anything worth discussing at all.
Are you are the one claiming that other are unrealistic!
The situation you describe is possible. Is it really probable?
(please supply the input value in the example code that make it
happen, the example code does have some small allocations and a
potentially large one).
Instead of using an array, just use a set, map, or linked list with a
type of some fixed, small size. Such cases are quite probable. I'm
not sure why you're so resistant to the notion that most applications
do not see std::bad_alloc until they are almost out of memory or the
operating system is almost out of memory.
Like I said, on a common 64-bit operating system and platforms, your
program has terabytes of virtual address space. The operating system
probably only has gigabytes to tens of gigabytes of commit to serve
that space. That means your singular request has to be 2^32 or so
in size, minimum, in order to trigger std::bad_alloc for the /sole/
reason you can handle. Are you honestly telling me, with a straight
face, you believe that is more common than what I suggest happens?
Even on a 32-bit operating system, that request would have to be
3*2^30 or so in order to trigger the response you can handle
robustly. That's a lot of memory. On my system right now, the
largest singular process is Eclipse at ~1.4GiB virtual size. The
second largest is Firefox at ~210MiB virtual size. All of the rest
are considerably under 100MiB.
Now sure, I could cause a large allocation (in aggregate) to occur in
both by attempting to open a very big document. However, even if
they respond to the failure in the fashion you describe, they still
haven't managed to do what I want: open the document. In order to
open that document, they're going to have to be rewritten to support
handling files that are very large. There's three important
consequences to this reality:
* I will have to start a new version of the application in order to do
what I want: open the document.
* Since the application couldn't do what I want, I'm going to being
closing it anyway. As such, while termination without my input is
hardly the best response, it's also not that divorced from what was
going to happen anyway. As such, the negativity associated with
termination isn't nearly as bad as it could be.
You'll probably point out that both can handle multiple inputs, and
that failure to process one shouldn't impact the others. I'll note
that Firefox has tab isolation (now) and session management for tabs,
so that loss work is minimized on failure. Firefox needs this
functionality /anyway/, since it can crash for reasons entirely beyond
its control (e.g., a plugin forcibly kills the process). Eclipse isn't
nearly as wise, but Eclipse has a cavalier attitude towards my data in
many, many aspects.
Moreover, back to the original point, if those applications behave
like that every day (and they do), then why would I believe that
singular large allocations are common events?
Crashing application or crashing server are often undesireable even if
they will be restarted afterwards. You must have very tolerant
users.
No, I have mechanisms in place so the loss of a server is generally
transparent to the users.
Note: I am not arguiing against planning a recovery mechanism, I am
arguing against using the existence of a recovery mechanism to avoid
doing due diligence in avoiding crashes and writing quality code.
Again, the notion that code is of lower quality merely because it
crashes is simply not true.
It's easily the most absurd notion, by far and away, proffered in this
whole thread.
Adam