A
Anse
Nick said:why?
Because I am not like you and know rapists that use the internet.
Nick said:why?
Nick said:I suspect sanity isn't his native tongue
Sorry this is garbage. MSVC supports all C/C++ standards, also extra
features (eg C++ style // comments will be accepted in C programs too)
Maybe it could be avoided. But not always, especially not if you have
to code against the physical reality. But even in the case where a
deqadlock theoretically could be prevented, the tradeoff between "good
enough" and "perfect in 10 man-years" comes into play.
I'd be happy if you tell me how to fix the systems I'm supposed to
interface with. I'm bound by that pseky "good enough", so I cannot do
much more than assume that the opposite side of the socket, actually
implements the protocol as described. When that assertion fails, the
inevitable result is more often than not something that can be
described by the onomapoeticon "Kaboom!"
You're confusing two separate issues: programs that have errors and
programs that have resource demands that can't be met a few times a year
due to activities by the humans or other processes. The latter cannot be
corrected within the process; you can use convoluted code everywhere to
deal with rare events you can't correct, with the risk the convoluted
code is in error, or just abort, catch the signal and report with a
write(2,...), then exit.
Don't make my brown eyes China Blue said:I find it acceptable to crash a couple times a year due to unrecoverable
resource contentions that can only be recoverred by restarting anyway.
Don't make my brown eyes China Blue said:You're confusing two separate issues: programs that have errors and
programs that have resource demands that can't be met a few times a year
due to activities by the humans or other processes.
The latter cannot be
corrected within the process; you can use convoluted code everywhere to
deal with rare events you can't correct
, with the risk the convoluted code
is in error, or just abort, catch the signal and report with a
write(2,...), then exit.
No matter how convolutedly you code, at some point the system can be so
hosed you can't do anything. The question then is how defensively you want
to code, with the risk that the error is actually in the defensive code,
and at what point you want to deal with the reality that sometimes life
sucks.
No amount of C code is going to stick the ethernet cable back in.
Stephen Sprunk said:On 31-Aug-12 13:17, Anders Wegge Keller wrote:
My point is that one cannot say a situation is "unrecoverable" and
then, later in the very same sentence, explain how to recover from
it. If one can recover, then by definition it is not
"unrecoverable".
Anders said:Actually, I suspect Brown Eyes lives in that part of the world where
"good enough" is the deciding factor. What "good enough" is, is highly
dependant on the job at hand. A batch job, running every hour, that
can be restarted and recover on itself is "good enough", when it
restarts mid-batch twice a year. The control software for the
Curiosity sky crane is "good enough", when it never fails.
Knowing the difference between those two situations, is what most
managers and customers I know of, spend a lot of time mulling
over. None of them are prepared to pay the NASA-price for the
low-priority batch job.
Maybe it could be avoided. But not always, especially not if you have
to code against the physical reality. But even in the case where a
deqadlock theoretically could be prevented, the tradeoff between "good
enough" and "perfect in 10 man-years" comes into play.
I'd be happy if you tell me how to fix the systems I'm supposed to
interface with. I'm bound by that pseky "good enough", so I cannot do
much more than assume that the opposite side of the socket, actually
implements the protocol as described. When that assertion fails, the
inevitable result is more often than not something that can be
described by the onomapoeticon "Kaboom!"
Don't make my brown eyes China Blue said:I see you've never had to deal with database deadlocks.
Anse said:Are you "leading" this circle-jerk, "coach"? Hmm? Keith Sandustoff?
you just don't really know how to do hi-rel. hi-availability
stuff until you've done it. once you have, it doesn't
appear to cost any more than ... "the old sloppy way."
The economics of defects seems to be incredibly poorly understood.
You fix nothing by restarting a process. The only thing that you manage to
do is to get the process back in a state where the problems caused by your
bugs aren't yet being triggered. Meanwhile, your bugs are still there, and
it's only a matter of time before someone is forced to deal with the same
problems caused by the same bugs.
Stephen said:That is quite true.
A related lesson is that you can take a large system and scale it down,
but you can't take a small system and scale it up. Things like high
scalability, high availability, etc. need to be designed in from the
start; they cannot be added later because they fundamentally change the
design of the system. If done at the start, they don't add much to the
cost--but if left out, you'll eventually have to scrap the entire design
and start over.
As an industry, we seem to have a solid understanding of how much each
call to the support department costs and how much the QA department
costs overall,
but nobody seems able to quantify how much it costs to be
_known_ as an unreliable company that makes unreliable products, at
least until the situation has gotten so bad that one starts losing
market share--and few companies recover from that death spiral.
[...]Rui Maciel said:You fix nothing by restarting a process. The only thing that you manage to
do is to get the process back in a state where the problems caused by your
bugs aren't yet being triggered. Meanwhile, your bugs are still there, and
it's only a matter of time before someone is forced to deal with the same
problems caused by the same bugs.
I guess my point is that *apparently*, exposure to this is rare,
and therefore considered costly.
I hadn't considered scalability - W.R.T. software, that seems even
*worse* than reliability and availability in terms of being
properly ... considerable.
I am really quite unsure of that. The source of bias here is "well,
the present budget is <x>, so let's keep doing that" until you *can't*
do that any more, and then guess where cuts come from?
So the question is, how much time, and what's at stake. Are we talking aboutDon't make my brown eyes China Blue wrote:
You fix nothing by restarting a process. The only thing that you manage to
do is to get the process back in a state where the problems caused by your
bugs aren't yet being triggered. Meanwhile, your bugs are still there, and
it's only a matter of time before someone is forced to deal with the same
problems caused by the same bugs.
Stephen said:It's considered costly because it's rarely designed in from the start,
so people look at the cost of scrapping their design and starting over,
rather than the (smaller total) cost of doing it right the first time.
It's also considered costly because there aren't that many people who
know how to do it right the first time, and simple market economics
tells us that such people will therefore be more costly to employ. In
reality, though, that is less costly than doing it wrong the first time.
Also, many companies are started by people who have dreams of making it
big but are utterly unprepared for what it means to actually have that
happen.
There was a UPS commercial several years ago that showed a few
people in an office watching their sales ticker go live and cheering
when it rolled past a hundred orders--and then aghast when it soon
rolled past a hundred _thousand_ orders. For many startups, that isn't
too far off the mark--and it shows. Few survive that level of success,
usually by being bought by a larger company that knows how to handle it.
Yep.
Scalability, reliability and availability of software are all closely
related, and the most common solution (clustering) addresses all three.
True.
Basically, you cannot assume one of any functional unit; you must assume
there are N+1 of each unit, up to N of which are currently not available
(either due to being down or due to being overloaded), where N can be
anywhere from 0 (for small systems) to dozens or even hundreds (for
large systems). That is not something you can retrofit; it is a
fundamental change in the way you design systems.
(For extra credit, allow each unit within each N+1 group to be ahead or
behind one version of software, which enables on-line upgrades.)
This is a radically different approach from what someone here described
as the NASA model, where there is exactly one of everything that has to
have perfectly reliability and infinite capacity because if any unit
ever fails or gets overloaded, the system crashes--and people die.
I could tell you, to the penny, exactly how much it costs my employer
for each call to our support line. I could also tell you, to the penny,
exactly how much our QA department costs. I could even tell you, to the
penny, how much it costs to fix all the bugs that QA finds.
What I _can't_ tell you, even to within several orders of magnitude, is
how much it will cost us to _not find_ a bug or how much it will cost us
to _not fix_ said bug.
That's why the "expense" of finding and fixing bugs is always a target
for cuts: the available statistics only show half of the story.
This is a radically different approach from what someone here described
as the NASA model, where there is exactly one of everything that has to
have perfectly reliability and infinite capacity because if any unit
ever fails or gets overloaded, the system crashes--and people die.
I suspect you "I owe you something"?
What?
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.