If it's possible to stop processing a job safely and recover, sure.
Marking the job as an error might require allocating memory, so you
have to avoid that. This is more difficult that it sounds, and
requires code that is explicitly written to ensure that it is
possible. Screw that code up, and you'll almost certainly end up in a
deadlock situation or with a zombie worker thread, at which point you
will be restarting the process anyway!
I can easily come up with examples that demonstrate that for this
particular case, terminating on memory failure is the best situation.
Unfortunately, it seems to me that some peoples seems to advocate "you
must always terminate on any allocation failure on any situation and
this is and will forever be the only valid solution". You post,
unfortunately seems to try to expand my voerly simplified example in
such a way to demontrate that this will never work.
It all depend on context.
Simple example: explode a JPEG image for editing or explode a zip file
in memory to look at the content. Multiple worker thread can all be
doing some processing. When one of the worker try to allocate a large
amount of memory to process the job, the large allocation will fail.
This is a relatively safe situation to recover from because the huge
probability is that what will fail is the single call to do a large
allocation. In this case, what can't be done is allocate a large
block of memory. "common" small block of memory will still be working
fine. So one could safely catch this bad_alloc in the caller and
forgo processing this particular item. It is very unlikely that there
would not be enough memory to log and mark a job as error.
There is a remote possibility that the large alloc fills the memory to
99.99999% and then the next small alloc fails. This is a rare case
but not necessarily a problem as it can treated differently. In the
area of the code where you know that you will be attempting to
allocate a variable large amount of memory, you can consider bad_alloc
as being a recoverable error, if the bad_alloc happens elsewhere
during "common" allocation, then this is a unexpexted error from
unknown causes and it is probably best to give up and terminate.
Anyway, while you're in the process of attempting to cancel the
oversized job, many other jobs will fail (possibly all of them) since
they can no longer allocate memory either. In the meantime, all your I/
Maybe and maybe not. Simply because as described above, what failed was
allocating the *large amount* of memory.
The following code is perfectly safe:
try
{
int *p1 = new int[aMuchTooLargeNumber];
}
catch(std::bad_alloc &)
{
// OK to ignore
// I can even safely use memory to write logs
}
int *p2 = new int[10];
// Don't catch bad_alloc here because in this case
// this would be a really unexpected error
In fact since the heap allocator will be thread safe (using various
methods such as mutex, lock and sub-heaps) there should be no point in
time when one attempting to allocate a "not too large" block
would fail because a different thread is currently attempting to
allocate "a much too large" block.
(obviously, it is possible that a block size that would not be too
large in a single thread would not be possible to allocate if all the
worker thread all attempted to process such a "large-ish but not too
large" job at the same time.
Yannick