You know the old tale about Baron Munchhausen, who was sinking in a swamp,
but then pulled himself out by his hair.
Fault isolation is like that. In reality you can do it only on the
archimedan path, using some fixed point. Otherwise it's jut the baron's
tale.
Is it really? I'd classify the line of thoughts in the usual 'wishful
thinking' -- where you expect the solution to be present, so it is talked
into the world. As words are easy to mince.
It is, but just stating that will not create its funation anywhere. You may
have it, sometimes you can build it -- but it is a serious business.
Process level is one possible 'fixed point'. Provided your environment
implements process handling that way. (i.e your 'process' runs in a proper
sandbox (say "user space") while some other part of the system runs
separated (say "kernel space"), and nothing can be done in the former to
mess up the latter. Including I/O operation, processor exceptions, resource
exhaustion, etc -- the supervisory system shall be able to recover from
anything.
Even that is not easy to accomplish, with all the support built into today's
microprocessors. But most OS-es aim for exactly that, so you can use fruits
of the gigantic effort.
More reliable systems use separation of more systems.
I do not state that isolation within a process is impossible in the first
place, but it has similar amount of requirements and way less support. So it
will be too rarely practical, if you stick to true meaning of roboustness,
not just wish it in.
I heard about calculation of chances in too many sad conversations. All were
really just empty claims and playing russian roulette with customers/users.
Until I observe something better, I stick to binary: can or can not violate.
Where can violate, I calculate it as a 100% chance, and act accordingly. Too
bad others do not -- as the world seem to go the Terry Pretchet's way (in
Discworld seem that 1 to million chances happen 9 times out of ten...).
Yes, you build the threat model like that. Not the other way around, as
usual ("who on earth will enter 8000 characters in the password field?"
"access to that variable happens rarely, no way will that race condition
manifest" ... etc ... )
I wouldn't bet on the last one, as my programs tend to run a decade 7/24
with like 1 defect reported in the period, while JVMs are full of
frightening fixes, and their stability did not impress me ever.
But that was not what I was talking about really, for the scope of this
discussion we can assume the JVM works perfectly to its specification, and
just look whether a faulting java program is okay to throw exception at the
fault detecting spot instead of doing halt, or jump directly to the monitor.
I don't think so. For a moment let's even take aside concerns on building
the in-process monitor. Suppose we have it, and it is sitting on the top
end of the exception chain, and once reached can magically discard all the
bad stuff and resume some healthy execution.
What can happen in between? just two things from tp of my head:
1. code running form finally{} blocks
2. catch{}
Say your program writes some output and uses temporary files. On the
normative path when it finished all the temporary files are removed. it
keeps a neat list of them.
If it detects some problem (not fault, just a recoverable condition, like
access denied or disk full), it throws exception, cleanup is done upwards in
finally blocks, on top an exception handler reports the user that another
try is due. But the state is happy and clean.
Now suppose there is a fault in the program, and the state is messed up. it
is detected in some assert -- and you throw exception also -- to be caught
even higher than the previous. The finally blocks run, and process theit
job list -- that can be messed up to any degree, so deleting all your disk
including mounts. Or maybe just the input files instead of the temporaries,
whatever. Not my idea of being roboust, or containment.
For the other is there anything to say? Mis-processing exceptions is not
so different to other kind of bugs. Execution continues in the bad state.
I like that article. But I would not merge the different subjects. It is
originally about correct programs with different performance.
Certainly quality is also a 'tradeoff' if you measure up costs of
problem-detection methods. And we well know that software production is yet
in the wild west era, no enforced quality controls, 'provided as is' license
nonsense and so on. And customers buy, often even demand the crapware.
my poractical view is that there is indeed limit, and diminishing returns to
detection -- a few problems will remain in coding, and some interaction not
addressed in design. But the usual practices stop quality control several
magnitudes under that point. And replace actual quality with talk, or
making up chances. Or delusions about roboustness without any kind of proof
it is there.
Actually it does not make me think just like that, and when I'm asked about
'likely', I rather stick to the raw threat model -- ahat actions can or can
not happen, and what consequences. Where it is important, checking or
providing kernel code is due, or suggesting the external measures/better
isolation.
Certainly we're not clairvoyant, so yet uncovered bugs are not considered..
But that leads far away from the original point, and instead of
relativisation, please defend the in-process containment idea. If you mean
it in "general".
C++ is used for many things, not only in unix/win32-like environment. In
embedded stuff you often do not have any OS at all, so it's up to you to
build the "fixed point" inside or outside.
Maybe so, in my view if someone claims to have the perpetoom mobile, it is
his job to prove it -- just like I want to see the old Baron lifting from
swamp.
OTOH we may not really be in total disagreement. As a java system
(probably) can be configured such that executing code is fixed, and an
internam monitor that uses no state at all -- thus protected from in-process
problems. tratring from there it may be possible to build a better one,
with some protected state too.
I'm yet sceptic that when talk is about such a thing it is actually created
around my requirements on "trusted computer base". Pelase don;t take it
personally, I just encountered too many 'smopke and mirrors' stuff,
especially with java-like systems where security was just lied in, referring
non-existing or irrelevant things.
C-like systems are at least famous for being able to corrupt anything, so we
can skip over an idle discussion.
The crucial question is whether that 'more' isolation is 'enough' isolation
for the purpose. That is where I stay sceptic. You know, a beer bottle
is way less fragile than an egg, but the difference is irrelevant if you
drop them from the roof on concrete. Or even from a meter height.
And to point out again, the original idea was introducing yet an extra gap
between the supposedly safe area and the point of fault discovery.