Is C++ used in life-critical systems?

James Kanze · Dec 30, 2010

On Dec 15, 10:10 pm, "Marc" <[email protected]> wrote:

[...]

Read up on the Ariane bug its
quite enlightening (once you get past the pontificating ("if they'd
use Blub this would never have happened!")). The space shuttle
software development process is quite interesting as well.

Just a reminder: there was no bug in the Ariane's software.
Management decided to just reuse it in a different context: the
software did what it was supposed to do, for the system it was
written for. (In other words, stating that there was a bug in
the software is like saying that your C++ compiler has a bug
because it doesn't correctly compile someone's Ada program.)

It is an interesting point with regards to the the question at
hand: at a larger level, the requirements of the system are to
auto-destruct if a bug is found. (The Ariane auto-destructed
because the software determined that the systems providing its
input were defective, since the values were impossible.) If you
aren't sure that you have full control, better to auto-destruct
than to risk crashing into a populated city.

Jorgen Grahn · Dec 30, 2010

CFront has been dead for the last 20 years and Comeau is hardly a pre-
processor.

And even with CFront, Stroustrup argued (in "Design and Evolution
....", in 1994) that it wasn't a preprocessor -- it simply compiled to
C code instead of machine code.

Man-wai's statement reads like an echo from the late 1980s.

/Jorgen

Ebenezer · Dec 30, 2010

[...]

Member functions? That is not such a big deal alone,
easily worked around with "python style".
RAII is a big deal, and function/operator overloading, and
private/public. Probably other things too.

Click to expand...

Click to expand...

FWIW: private/public, in connection with member functions, are,
even today, the single most important improvement in C++ over C.
The rest is just icing on the cake---pretty nice icing, in a lot
of cases, but not as important as the encapsulation.

Click to expand...

I'd say the automatic construction and destruction that enables RAII is
the single most important improvement in C++ over C. It's one thing
that you simply can't do in C. Encapsulation is just icing on the cake!

I don't know how one could prove that this or that is the most
important improvement. But I agree they are both important.
I tend to agree more with Mr. Kanze, say about pair programming,
than Mr. Collins, but in this case, am somewhat sympathetic to
Collins' point of view.

Brian Wood
Ebenezer Enterprises
http://webEbenezer.net

gwowen · Dec 31, 2010

Just a reminder: there was no bug in the Ariane's software.

There was no bug in Ariane 4's -- the software met the requirements.
There was a bug in Ariane 5's software -- the code had remained the
same but the requirements had changed.

James Kanze · Dec 31, 2010

There was no bug in Ariane 4's -- the software met the requirements.
There was a bug in Ariane 5's software -- the code had remained the
same but the requirements had changed.

No. The "bug" was that there wasn't any Ariane 5 software:
management simply decided to use the software from Ariane
4 without changes (and without specifying any new requirements).
The bug wasn't in the software; the problem was a very poor
management decision.

James Kanze · Dec 31, 2010

On 12/30/10 11:46 PM, James Kanze wrote:

[...]

Member functions? That is not such a big deal alone,
easily worked around with "python style".
RAII is a big deal, and function/operator overloading, and
private/public. Probably other things too.

Click to expand...

FWIW: private/public, in connection with member functions, are,
even today, the single most important improvement in C++ over C.
The rest is just icing on the cake---pretty nice icing, in a lot
of cases, but not as important as the encapsulation.

Click to expand...

I'd say the automatic construction and destruction that enables RAII is
the single most important improvement in C++ over C. It's one thing
that you simply can't do in C. Encapsulation is just icing on the cake!

The two are related; without the encapsulation, I doubt that
automatic construction and destruction would work. They're both
related to the idea that everything which happens to objects of
the class type is through members.

I suspect the different opinion is related to the size of the
projects we've worked on, however. I remember back when I was
programming in C: I'd define a struct and a set of functions to
manipulate it... and then cross my fingers that no one accessed
it except through the functions I'd provided. In smaller
projects, you may have more control over the programmers, and be
able to ensure that this doesn't happen. And ensuring the
initialization and automatic destruction---regardless of the
path which causes the variable to go out of scope---is an
important feature. When I said the rest is just "icing on the
cake", I was thinking of things like inheritance and templates:
both very powerful tools, but not as important as the
encapsulation (in the larger sense, including the idea that the
client cannot use an uninitialized instance of your object, nor
abandon an instance without proper cleanup).

James Kanze · Dec 31, 2010

And even with CFront, Stroustrup argued (in "Design and Evolution
...", in 1994) that it wasn't a preprocessor -- it simply compiled to
C code instead of machine code.

CFront was definitly a compiler; I believe one OEM actually
modified it to produce object code (or maybe assembler)
directly. C code is just a convenient "intermediate" language.

Man-wai's statement reads like an echo from the late 1980s.

Man-wai's statement reads like a statement from someone who
knows neither C++ nor compilers.

Nick Keighley · Dec 31, 2010

Just a reminder: there was no bug in the Ariane's software.

I didn't say there was. But when a rocket falls from the sky we can
safely say there was a bug in something!

Management decided to just reuse it in a different context: the
software did what it was supposed to do, for the system it was
written for. (In other words, stating that there was a bug in
the software is like saying that your C++ compiler has a bug
because it doesn't correctly compile someone's Ada program.)

error: the system behaves in manner not expected by a reasonable user

It is an interesting point with regards to the the question at
hand: at a larger level, the requirements of the system are to
auto-destruct if a bug is found. (The Ariane auto-destructed
because the software determined that the systems providing its
input were defective, since the values were impossible.)

I understood it was destroyed by the range safety officer

If you
aren't sure that you have full control, better to auto-destruct
than to risk crashing into a populated city.

are there many near the Ariane launch site?

Michael Doubez · Dec 31, 2010

I didn't say there was. But when a rocket falls from the sky we can
safely say there was a bug in something!

Actually, there was, a variable was supposed within reasonable range.

http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html
<quote>
In the failure scenario, the primary technical causes are the Operand
Error when converting the horizontal bias variable BH, and the lack of
protection of this conversion which caused the SRI computer to stop.
</quote>

But the domain of Ariane 4 was small enough that this bug wasn't
triggered.

Nonetheless, there was apparently a lack of validation of the module
for Ariane 5.
<quote>
Testing at equipment level was in the case of the SRI conducted
rigorously with regard to all environmental factors and in fact beyond
what was expected for Ariane 5. However, no test was performed to
verify that the SRI would behave correctly when being subjected to the
count-down and flight time sequence and the trajectory of Ariane 5.

error: the system behaves in manner not expected by a reasonable user

Actually, it did. Upon detection of the error, the politic was to
shutdown the processor although (from the report), another scenario
could have been provided (an estimate from the SRI)

<quote>
Although the source of the Operand Error has been identified, this in
itself did not cause the mission to fail. The specification of the
exception-handling mechanism also contributed to the failure. In the
event of any kind of exception, the system specification stated that:
the failure should be indicated on the databus, the failure context
should be stored in an EEPROM memory (which was recovered and read out
for Ariane 501), and finally, the SRI processor should be shut down.

It was the decision to cease the processor operation which finally
proved fatal. Restart is not feasible since attitude is too difficult
to re-calculate after a processor shutdown; therefore the Inertial
Reference System becomes useless. The reason behind this drastic
action lies in the culture within the Ariane programme of only
addressing random hardware failures. From this point of view exception
- or error - handling mechanisms are designed for a random hardware
failure which can quite rationally be handled by a backup system.

I understood it was destroyed by the range safety officer

Yes but it could have had a different strategy.

<quote>
Although the failure was due to a systematic software design error,
mechanisms can be introduced to mitigate this type of problem. For
example the computers within the SRIs could have continued to provide
their best estimates of the required attitude information. There is
reason for concern that a software exception should be allowed, or
even required, to cause a processor to halt while handling mission-
critical equipment. Indeed, the loss of a proper software function is
hazardous because the same software runs in both SRI units. In the
case of Ariane 501, this resulted in the switch-off of two still
healthy critical units of equipment.

are there many near the Ariane launch site?

No but when you have this much tons of metals at this speed and
accelerating, you may reach inhabited locations quite quickly.

The palms goes to this citation
<quote>
Returning to the software error, the Board wishes to point out that
software is an expression of a highly detailed design and does not
fail in the same sense as a mechanical system. Furthermore software is
flexible and expressive and thus encourages highly demanding
requirements, which in turn lead to complex implementations which are
difficult to assess.

An underlying theme in the development of Ariane 5 is the bias towards
the mitigation of random failure. The supplier of the SRI was only
following the specification given to it, which stipulated that in the
event of any detected exception the processor was to be stopped. The
exception which occurred was not due to random failure but a design
error. The exception was detected, but inappropriately handled because
the view had been taken that software should be considered correct
until it is shown to be at fault. The Board has reason to believe that
this view is also accepted in other areas of Ariane 5 software design.
The Board is in favour of the opposite view, that software should be
assumed to be faulty until applying the currently accepted best
practice methods can demonstrate that it is correct.

This means that critical software - in the sense that failure of the
software puts the mission at risk - must be identified at a very
detailed level, that exceptional behaviour must be confined, and that
a reasonable back-up policy must take software failures into account.
</quote>

James Kanze · Dec 31, 2010

On 31/12/2010 08:58, James Kanze wrote:

Code which does not meet its requirements is defective (in other words
"buggy").

Totally agreed. The code in the Ariane 5 met its requirements.
The fact that management decided to use the code from the Ariane
4, without redefining the requirements and having the code
rewritten (as much or as little that was necessary) is not an
error in the code, but rather in the decision process management
was using.

I think my analogy of using a C++ compiler to compile Ada is
very close. Would you consider your C++ compiler buggy because
someone used it to compile Ada, and it failed?

James Kanze · Dec 31, 2010

I didn't say there was. But when a rocket falls from the sky we can
safely say there was a bug in something!

Definitly. But in this case, not the software.

error: the system behaves in manner not expected by a reasonable user

The software behaved in the manner it was supposed to behave.

I understood it was destroyed by the range safety officer

No. The software received an "impossible" input, which it was
not capable of processing. That triggered an exception, which
lead to "fast failure", and the backup took over. The backup
which was running the same software, and seeing the same inputs.
Once all of the backups had failed as well, hardware safety
systems triggered the auto-destruct.

are there many near the Ariane launch site?

The Ariane lauch pad sends them out over the Atlantic. But
those things move pretty fast, and if it was without reliable
navigation, it could easily reach western Europe before it hit
the ground. The whole point of auto-destruct is to ensure that
it didn't reach Europe.

James Kanze · Dec 31, 2010

On 31 déc, 10:21, Nick Keighley <[email protected]>
wrote:

Actually, there was, a variable was supposed within reasonable range.

http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html

Just a reminder. This report was commanded by the same
management which decided that the software from the Ariane
4 could be reused without validation, since it obviously worked
(or was commanded by the people who appointed those people).
Politicians (and this was a political issue) don't like
admitting their own mistakes, or those of their friends.
Descriptions of what technically occurred are probably valid,
but some of the conclusions don't actually follow from the
technical details presented.

<quote>
In the failure scenario, the primary technical causes are the Operand
Error when converting the horizontal bias variable BH, and the lack of
protection of this conversion which caused the SRI computer to stop.
</quote>

But the domain of Ariane 4 was small enough that this bug wasn't
triggered.

In the Ariane 4, this was the desired behavior. The values in
question weren't possible. The presence of the Operand Error
could only be due to a hardware failure upstream, and indicated
that the processor was not getting correct input.

Nonetheless, there was apparently a lack of validation of the module
for Ariane 5.
<quote>
Testing at equipment level was in the case of the SRI conducted
rigorously with regard to all environmental factors and in fact beyond
what was expected for Ariane 5. However, no test was performed to
verify that the SRI would behave correctly when being subjected to the
count-down and flight time sequence and the trajectory of Ariane 5.
</quote>

In order to test, you have to first specify what the behavior is
supposed to be. Not even considering that different behavior
might be required is the root cause of the problem. But of
course, that decision was made by people high enough up that the
report didn't bring them into question.

Actually, it did. Upon detection of the error, the politic was to
shutdown the processor although (from the report), another scenario
could have been provided (an estimate from the SRI)

<quote>
Although the source of the Operand Error has been identified, this in
itself did not cause the mission to fail. The specification of the
exception-handling mechanism also contributed to the failure. In the
event of any kind of exception, the system specification stated that:
the failure should be indicated on the databus, the failure context
should be stored in an EEPROM memory (which was recovered and read out
for Ariane 501), and finally, the SRI processor should be shut down.

It was the decision to cease the processor operation which finally
proved fatal.

Note that this was the correct decision for the Ariane 4. It
was the decision to use the software from the Ariane 4 without
revalidation that is the ultimate cause of the accident.

Restart is not feasible since attitude is too difficult
to re-calculate after a processor shutdown; therefore the Inertial
Reference System becomes useless. The reason behind this drastic
action lies in the culture within the Ariane programme of only
addressing random hardware failures. From this point of view exception
- or error - handling mechanisms are designed for a random hardware
failure which can quite rationally be handled by a backup system.
</quote>

Yes but it could have had a different strategy.

<quote>
Although the failure was due to a systematic software design error,
mechanisms can be introduced to mitigate this type of problem. For
example the computers within the SRIs could have continued to provide
their best estimates of the required attitude information. There is
reason for concern that a software exception should be allowed, or
even required, to cause a processor to halt while handling mission-
critical equipment. Indeed, the loss of a proper software function is
hazardous because the same software runs in both SRI units. In the
case of Ariane 501, this resulted in the switch-off of two still
healthy critical units of equipment.
</quote>

And that is the conclusion which is totally unjustified. In the
Ariane 4, not shutting the system down in this condition would
have been a serious error.

No but when you have this much tons of metals at this speed and
accelerating, you may reach inhabited locations quite quickly.

The palms goes to this citation
<quote>
Returning to the software error, the Board wishes to point out that
software is an expression of a highly detailed design and does not
fail in the same sense as a mechanical system. Furthermore software is
flexible and expressive and thus encourages highly demanding
requirements, which in turn lead to complex implementations which are
difficult to assess.

An underlying theme in the development of Ariane 5 is the bias towards
the mitigation of random failure. The supplier of the SRI was only
following the specification given to it, which stipulated that in the
event of any detected exception the processor was to be stopped. The
exception which occurred was not due to random failure but a design
error. The exception was detected, but inappropriately handled because
the view had been taken that software should be considered correct
until it is shown to be at fault. The Board has reason to believe that
this view is also accepted in other areas of Ariane 5 software design.
The Board is in favour of the opposite view, that software should be
assumed to be faulty until applying the currently accepted best
practice methods can demonstrate that it is correct.

This means that critical software - in the sense that failure of the
software puts the mission at risk - must be identified at a very
detailed level, that exceptional behaviour must be confined, and that
a reasonable back-up policy must take software failures into account.
</quote>

That is certainly the palm, since it shows that the committee
who wrote the report didn't understand the rational behind the
original design decisions. In this case, it would have saved
the Ariane 5. Had similar input occurred in the Ariane 4,
however, it could well have resulted in the missle crashing in
a highly populated area.

Balog Pal · Jan 1, 2011

Ian Collins said:
]
Member functions? That is not such a big deal alone,
easily worked around with "python style".

Click to expand...

RAII is a big deal, and function/operator overloading, and
private/public. Probably other things too.

Click to expand...

FWIW: private/public, in connection with member functions, are,
even today, the single most important improvement in C++ over C.
The rest is just icing on the cake---pretty nice icing, in a lot
of cases, but not as important as the encapsulation.

Click to expand...

I'd say the automatic construction and destruction that enables RAII is
the single most important improvement in C++ over C. It's one thing that
you simply can't do in C. Encapsulation is just icing on the cake!

Yeah. At the time of writing the quoted part destructors were eclipsed in my
mind for some reason.... In many earlier posts I was pointing out that I;d
use C++ over many other stuff for nothing else but having the destructor
tech, and able to RAII.

OTOH, must mention, that in my latest embedded project (that could prbably
be a good representative of a whole class), there was nothing to RAII. As
there was no heap/free store, and no exceptions. 0 dtor-eqiualent functions
in the whole system. And calling the hndful of ctor-equivalents were not a
practical problem.

While at it I better mention the other part -- i do my project with
double-compile, and in this case the other compiler (binary output unused
beyond parts in unit tests) is used in C++ mode, so all type safety benefits
are gained anyway.

Balog Pal · Jan 1, 2011

James Kanze said:
On 12/30/10 11:46 PM, James Kanze wrote:

]
Member functions? That is not such a big deal alone,
easily worked around with "python style".
RAII is a big deal, and function/operator overloading, and
private/public. Probably other things too.
FWIW: private/public, in connection with member functions, are,
even today, the single most important improvement in C++ over C.
The rest is just icing on the cake---pretty nice icing, in a lot
of cases, but not as important as the encapsulation.

Click to expand...

Click to expand...

I'd say the automatic construction and destruction that enables RAII is
the single most important improvement in C++ over C. It's one thing
that you simply can't do in C. Encapsulation is just icing on the cake!

Click to expand...

The two are related; without the encapsulation, I doubt that
automatic construction and destruction would work. They're both
related to the idea that everything which happens to objects of
the class type is through members.

I would not put them (access control vs auto executing functions) on the
same table. What the compiler (language) does for us (IMO):

access control:
- no need to encode AC into function names
- catch mistakes outider trying access of what not supposed to

auto functions:
- removes all those function calls from visible source
- makes impossible to forget init
- makes impossible to forget cleanup

Fo me, the impact of the latter weigh like a ton, and the first like a few
gramms. Both in my code and that I saw written by others. I saw endless
amount of bugs due to forgotten init, and even more resource leaks or state
discrepancies due to misplaced returns or simple forgetfulness.
Also RAII-ized code becomes readable and understandable while its *correct*
equivalent is a verbose mess full of either repetitions or control flow
management.

For the other group, fighting the disciplene issues would not be way that
hard, and in my work where compiler flagged me for illegal access it was
(wild guess) 80% design issue of the class, and solution was to create the
access.

IOW, when documentation states
- you shall not write priv_* in your code unless ...
- you shall call fclose() on the obtained pointer

though similar in simplicity appear not the same in the extent they are
actually obeyed, neither the ease to catch discrepancies with review or
tools.

Certainly I *do* like and use 'private', and it is great to have, it fades
only in comparision.

I suspect the different opinion is related to the size of the
projects we've worked on, however.

Surely, the scale factor works differently -- IMO the autofuncs are
important in any size, starting at 1 lines, and I'd say scale sinearly.

While benefits of AC scales in a combinatoric way with number of classes (or
something like that) -- it is mostly redundant at beginning and the picture
will change when you have classes by hundreds and thousands.

As we started talking embedded, I think quite many projects here fall in the
first category. ( And for the big ones the issue is going away as there as
you will be able to access C++, if not else asking for a Cameau port... )

I remember back when I was
programming in C: I'd define a struct and a set of functions to
manipulate it... and then cross my fingers that no one accessed
it except through the functions I'd provided. In smaller
projects, you may have more control over the programmers, and be
able to ensure that this doesn't happen.

Yeah, that is the way. It is PITA, but can be covered.

And ensuring the
initialization and automatic destruction---regardless of the
path which causes the variable to go out of scope---is an
important feature.

My experience shows covering that is way harder.

Balog Pal · Jan 1, 2011

James Kanze said:
No. The "bug" was that there wasn't any Ariane 5 software:
management simply decided to use the software from Ariane
4 without changes (and without specifying any new requirements).
The bug wasn't in the software; the problem was a very poor
management decision.

I suggest everyone to read the whole story instead of summaries -- it has
many elements and one can gain a deal of insight. On many different
territories.

Bart van Ingen Schenau · Jan 1, 2011

Bad analogy. The compiler example would fail immediately. The Ariane
example is not an immediate failure as presumably it survived some
rounds of testing (even though it seems that any testing performed was
not rigourous enough).

I don't think the analogy is that bad.
If the requirements themselves are incorrect, then there is no amount
of testing that will show it.
To my knowledge, the requirements on the software for both Ariane 4
and Ariane 5 were identical. It just happened that those requirements
were unsuitable for Ariane 5.

/Leigh

Bart v Ingen Schenau

Nick Keighley · Jan 2, 2011

I suggest everyone to read the whole story instead of summaries -- it has
many elements and one can gain a deal of insight. On many different
territories.

which is roughly what I said in the first place

James Kanze · Jan 2, 2011

Bad analogy. The compiler example would fail immediately. The Ariane
example is not an immediate failure as presumably it survived some
rounds of testing (even though it seems that any testing performed was
not rigourous enough).

The software did fail on the Ariane V, the very first time it
was used. Given the way the Ariane V worked, it would fail
every time. From what I have read, it underwent *no* testing,
since as far as management was concerned, it was already
proven. The problem, again, is that management decided to reuse
a component without asking any technical people whether such
reuse was appropriate. Rather than draw up requirements for the
Ariane V, they simply plugged in software (and partially,
hardware) from the Ariane IV.

James Kanze · Jan 2, 2011

"James Kanze" <[email protected]>

I suggest everyone to read the whole story instead of summaries -- it has
many elements and one can gain a deal of insight. On many different
territories.

I have read the official results of the enquiry, in their
entirity. As I said earlier, you have to read it carefully,
because there are important things it doesn't say, but which are
natural conclusions from the concrete facts they couldn't avoid
saying.

antred · Jan 3, 2011

Man-wai Chang said:
>> My bet to your question is NO, because of the memory needed and latency
>> in any OO languages!
>
> Well it's a good thing C++ isn't just an OO language, isn't it?

From an artist point of view, yes, inheritance looks pretty.
But old C programs can still be elegant if you modularize your codes!

Don't forget that C++ is usually a pre-processor that generates C codes!

If you want speed and low overhead, C is preferred!

I'm sorry, but that's silly. There is no reason at all why a well-written C program should run ANY faster than an equivalent (and equally well-written) C++ program. The alleged 'overhead' you're alluding to exists only in your imagination.
Yes, horribly slow and inefficient code can be written in C++ (but then that's true for ANY language), and yes, inheritance and virtual functions can be overused (although they should still perform better than functionally equivalent switch-case or if-elseif-else type code). If you're coding a time-critical application in C++, it's not hard to constrain yourself to a limited subset of C++. At any rate, if you're so concerned about speed, beware that you will still have to provide your own replacement for some of the "under-the-hood" magic that the C++ compiler will occasionally do for you, and then good luck trying to outdo an aggressively optimizing C++ compiler with your own-hand rolled stuff. 9 times out of 10, it'll be an exercise in futility.

Real-time developers/designers: Can abort() be used to fail-fast in a safety-critical system?	16	Dec 17, 2010
Using template in safety-critical system (flight critical system)	15	Jan 24, 2008
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Critical javascript security flaw in firefox	8	Oct 2, 2006
[C++ Now! 2012] Deadline extension: Call for Submissions, newdeadline is January 20th, 2012	3	Jan 10, 2012
Who is using DBI or oci8 in production systems	1	May 18, 2007
Dynamic features used	7	Nov 21, 2008
C++ Now 2013 Call for Submissions	0	Oct 31, 2012

Is C++ used in life-critical systems?

James Kanze

Jorgen Grahn

Ebenezer

gwowen

James Kanze

James Kanze

James Kanze

Nick Keighley

Michael Doubez

James Kanze

James Kanze

James Kanze

Balog Pal

Balog Pal

Balog Pal

Bart van Ingen Schenau

Nick Keighley

James Kanze

James Kanze

antred

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads