Is C++ used in life-critical systems?

M

Marc

I pulled this out from another thread as it seems to be a good topic. As
there was a bunch of hypothesizing going on about the design of
safety/life-critical systems in regards to how errors ("exceptions",
tomatoe/tomato) are handled. At least one person suggested that abort()
constitutes a fail-fast (see http://en.wikipedia.org/wiki/Fail-fast)
design, which seems completely wrong.

C++ Question:

Is C++ used in life-critical systems? Expound please.


Non-C++-specific Question:

Recognizing that higher-level supervision (by other systems) is surely a
common design in critical systems, thatwithstanding, how does any one
specific program handle detected bugs (the program itself detected a bug)
at runtime in released software?
 
J

Joshua Maurice

I pulled this out from another thread as it seems to be a good topic. As
there was a bunch of hypothesizing going on about the design of
safety/life-critical systems in regards to how errors ("exceptions",
tomatoe/tomato) are handled. At least one person suggested that abort()
constitutes a fail-fast (seehttp://en.wikipedia.org/wiki/Fail-fast)
design, which seems completely wrong.

C++ Question:

Is C++ used in life-critical systems? Expound please.

Non-C++-specific Question:

Recognizing that higher-level supervision (by other systems) is surely a
common design in critical systems, thatwithstanding, how does any one
specific program handle detected bugs (the program itself detected a bug)
at runtime in released software?

I've never worked on them, but the main school of thought for critical
systems is:
1- Test a lot, code reviews, etc., to decrease the bug count as much
as possible.
2- Fault tolerance through components where each component has fault
isolation from the rest of the components, where the components
provide redundancy, backups and rollover, etc.
3- Fail-fast in each component.

Basically, the goal is to provide a working product at the end of the
day. You can do this by reducing bug-count, but in most programming
languages, and especially in C and C++, a single bug anywhere in the
process can completely corrupt the process. So, have multiple
independent processes so that if one fails, another can take over. I
used the example of a unix process because unix processes have decent
fault isolation with each other - that is, if one process has a bug
and fails, it's unlikely to affect another process. Fault isolation is
key to allow backups and rollover.

The processes which fail should fail fast - you're in an unknown
state, so hell if you know what will happen if you execute anything,
such as logging code, error recovery code, and so on. This applies
only to bugs, aka unexpected failures. If the failure is expected
(such as disk is full), and you prepared for it, then you don't need
to instantly die. However, for an unexpected null pointer access, that
is generally a good example of where you should just die immediately,
preferably leaving the equivalent of a core dump so someone can look
at it and fix the problem.

I emphasized processes above, but that was just a specific example. A
process can still affect another process, so perhaps fault isolation
at the hardware level is called for. Perhaps you're worried about the
power supply, so you get a backup power supply as well. The main ideas
of fault tolerance through multiple components with fault isolation
doesn't change, but the specifics of the situation do.

I recall reading about a post where a NASA space probe had a bug, and
it tripped something and gave the equivalent of a core dump. NASA had
planned for this correctly, and had put in a backup system with fault
isolation so that they were able to get that core dump, debug it, and
upload new code to the probe.

IIRC, there was another example, I forget offhand from where, where a
life critical system was made with 4 separate "computers". 3 of the
"computers" implemented most of the functionality, redundantly. Each
was using a different algorithm implemented by different teams. The
last of the "computers" was a simple thing that took a vote of the 3
main guys. If they all agreed, the vote taker acted on it. If there
was a 2-1 split, the vote taker would reset the "wrong" guy.
 
M

Michael Doubez

I pulled this out from another thread as it seems to be a good topic. As
there was a bunch of hypothesizing going on about the design of
safety/life-critical systems in regards to how errors ("exceptions",

Where I worked, they where not handled because they were disabled.
Embedded compiler commonly provides a subset of c++: (extended) EC++.

There has been a TR performance report to kill the EC++ initiative but
it is still popular.

Gabriel Dos Reis once mentioned there were researches for time-bound
guarantees for exceptions. But I know no more than that.
IIRC there is a DO178B C++ compiler but I have never seen this beast.
tomatoe/tomato) are handled.

I am no familiar with this expression, I guess it means interface
mismatch.
At least one person suggested that abort()
constitutes a fail-fast (seehttp://en.wikipedia.org/wiki/Fail-fast)
design, which seems completely wrong.

AFAIK this is how it is usually handled. The program aborts and
triggers a hot reboot or a failover sytem takes over.
C++ Question:

Is C++ used in life-critical systems? Expound please.

Yes. It does. B. Stroustrup page give references although it doesn't
extend to which part of a critical system it is used.

For the people I've met of the sector, it is however often not their
first choice. IMHO it is partly cultural but there is also some kind
of distrust toward things that get too much out of control and c++
does delegate a lot of mechanisms to the compiler (the so called under
the hood mechanisms).
Non-C++-specific Question:

Recognizing that higher-level supervision (by other systems) is surely a
common design in critical systems, thatwithstanding, how does any one
specific program handle detected bugs (the program itself detected a bug)
at runtime in released software?

Basic one is log at point of failure. At least, if there is crash, it
is nice to be able to locate the bug/problem in order to fix it (or
get out of sue suite) without waiting for the next crash.
 
I

Ian Collins

IIRC, there was another example, I forget offhand from where, where a
life critical system was made with 4 separate "computers". 3 of the
"computers" implemented most of the functionality, redundantly. Each
was using a different algorithm implemented by different teams. The
last of the "computers" was a simple thing that took a vote of the 3
main guys. If they all agreed, the vote taker acted on it. If there
was a 2-1 split, the vote taker would reset the "wrong" guy.

That is common practice in flight control systems. I worked along side
a UK based team who were working on one of the 3 black boxes (in Ada).
The other two were based elsewhere in Europe, possibly using different
programming languages. So the teams were not only isolated, but had
different cultural backgrounds!
 
I

Ian Collins

Where I worked, they where not handled because they were disabled.
Embedded compiler commonly provides a subset of c++: (extended) EC++.

There has been a TR performance report to kill the EC++ initiative but
it is still popular.

Is it? I thought that pointless abomination had been abandoned.
 
M

Michael Doubez

Is it?  I thought that pointless abomination had been abandoned.

To tell the truth, I am no longer in the field so I cannot say but 3
years ago IAR only supported EC++ (and arm tools also). It may have
changed.

It depends on whether customers asked for it :)
 
M

Marc

(I am not picking on you, but I am using your post as an example of what
I did/didn't ask for).

Joshua said:
I've never worked on them,

OK. So you'll be doing more hypothesizing then I gather. I was trying to
curb more of that and get to the crux of the issue.
but the main school of thought for critical
systems is:

[snipped the detail of what is extraneous to the question asked]

I made concise reference to the overall design of critical systems and
noted that my question was not that at all but rather any specific
component program of such a system and how with it, it handles a detected
bug (not an error, but an honest to goodness bug).
The processes which fail should fail fast - you're in an unknown
state, so hell if you know what will happen if you execute anything,
such as logging code, error recovery code, and so on. This applies
only to bugs, aka unexpected failures. If the failure is expected
(such as disk is full), and you prepared for it, then you don't need
to instantly die. However, for an unexpected null pointer access, that
is generally a good example of where you should just die immediately,
preferably leaving the equivalent of a core dump so someone can look
at it and fix the problem.

This is the same hypothesizing that led me to create this thread. I
suggested that "fail-fast within a given program" is an oxymoron, for
starters. Then I pondered how a program is coded when it does detect a
bug, but maybe there is no such detection code in the first place though,
now that I think about it more. THAT is what I want to know. That
probably requires someone who designs such systems or who has and
preferably someone who does it often or has done it more than once.

I'm not trying to curb response from those outside of the field, but just
trying to keep the dialog on topic and explain what I was looking for.
 
M

Marc

Michael said:
Where I worked, they where not handled because they were disabled.
Embedded compiler commonly provides a subset of c++: (extended) EC++.

OK, but this part of my question was a general design one and not
specifically about C++. So, how did they/you handle intra-program errors?
I am no familiar with this expression, I guess it means interface
mismatch.

No, I just was noting that I was using "error" and "exception"
synonymously.
AFAIK this is how it is usually handled. The program aborts and
triggers a hot reboot or a failover sytem takes over.

Are you hypothesizing? Or do you know because you have designed or
implemented such a system? Can you provide an actual scenario and how it
worked?
Yes. It does. B. Stroustrup page give references although it doesn't
extend to which part of a critical system it is used.

And that page, I presume, is this one:
http://www2.research.att.com/~bs/applications.html, right? Which
entry/entries there did you have in mind?
For the people I've met of the sector, it is however often not their
first choice. IMHO it is partly cultural but there is also some kind
of distrust toward things that get too much out of control and c++
does delegate a lot of mechanisms to the compiler (the so called under
the hood mechanisms).

That would be the expected given the mantra: "real(time) programmers use
C". Rather than just a "yes" or "no" answer then, what is really needed
is a how often C++ is used and in what systems/areas.
Basic one is log at point of failure. At least, if there is crash, it
is nice to be able to locate the bug/problem in order to fix it (or
get out of sue suite) without waiting for the next crash.

Are you hypothesizing?
 
G

Geoff

I pulled this out from another thread as it seems to be a good topic. As
there was a bunch of hypothesizing going on about the design of
safety/life-critical systems in regards to how errors ("exceptions",
tomatoe/tomato) are handled. At least one person suggested that abort()
constitutes a fail-fast (see http://en.wikipedia.org/wiki/Fail-fast)
design, which seems completely wrong.

C++ Question:

Is C++ used in life-critical systems? Expound please.


Non-C++-specific Question:

Recognizing that higher-level supervision (by other systems) is surely a
common design in critical systems, thatwithstanding, how does any one
specific program handle detected bugs (the program itself detected a bug)
at runtime in released software?
Is the flight control system of the JSF a life-critical system? If so,
search for a document describing the rules for coding C++ for it. It
has rules for constructs, exceptions, variable names, modules,
testing, etc. and the rationales for the rules.
 
M

Michael Doubez

OK, but this part of my question was a general design one and not
specifically about C++. So, how did they/you handle intra-program errors?

Using assert() like construct in order to check the logic and
correctness of the program. Upon failure, a hot reboot was triggered
(reset of the processor).

On Ariane spaceship, by example, it is rather a duplicated system that
takes over.
There is abundant litterature on the subject.

[snip]
Are you hypothesizing? Or do you know because you have designed or
implemented such a system?

I have worked on it.
Can you provide an actual scenario and how it
worked?

There was a hardware watchdog for detecting software freeze.
Asserts triggered a reboot of the software.

The scenario ? I have seen a crash or two :) But more frequently, upon
return from mission the logs are analysed and reboot are located and
entered in the bug report system.
And that page, I presume, is this one:http://www2.research.att.com/~bs/applications.html, right? Which
entry/entries there did you have in mind?

From memory, there are parts of airbus plane and the example of dam
control (designed with Z language for modeling IIRC).

And there is the JSF-rule document which hints at use of C++ in life/
mission critical code.
That would be the expected given the mantra: "real(time) programmers use
C". Rather than just a "yes" or "no" answer then, what is really needed
is a how often C++ is used and in what systems/areas.

When everyone knows that real programmers use assembly.
Are you hypothesizing?

No. I implemented it.
And everyone knows about black-boxes in planes.
 
J

James Kanze

That is common practice in flight control systems. I worked along side
a UK based team who were working on one of the 3 black boxes (in Ada).
The other two were based elsewhere in Europe, possibly using different
programming languages. So the teams were not only isolated, but had
different cultural backgrounds!

On the one avionics system I'm familiar with, there was
a requirement that the different implementations use different
programming languages. (Another form of different cultural
backgrounds:).) This was before Ada (or C++, or even C,
I think), though.
 
M

Man-wai Chang

Is C++ used in life-critical systems? Expound please.

Life-critical systems are mostly real-time systems.

My bet to your question is NO, because of the memory needed and latency
in any OO languages! :)

--
@~@ Might, Courage, Vision, SINCERITY.
/ v \ Simplicity is Beauty! May the Force and Farce be with you!
/( _ )\ (x86_64 Ubuntu 9.10) Linux 2.6.36.2
^ ^ 19:19:01 up 2 days 5:35 3 users load average: 1.00 1.05 0.89
ä¸å€Ÿè²¸! ä¸è©é¨™! ä¸æ´äº¤! ä¸æ‰“交! ä¸æ‰“劫! ä¸è‡ªæ®º! è«‹è€ƒæ…®ç¶œæ´ (CSSA):
http://www.swd.gov.hk/tc/index/site_pubsvc/page_socsecu/sub_addressesa
 
J

James Kanze

On 15 déc, 22:14, "Marc" <[email protected]> wrote:

[...]
Yes. It does. B. Stroustrup page give references although it doesn't
extend to which part of a critical system it is used.
For the people I've met of the sector, it is however often not their
first choice. IMHO it is partly cultural but there is also some kind
of distrust toward things that get too much out of control and c++
does delegate a lot of mechanisms to the compiler (the so called under
the hood mechanisms).

It's not just cultural. There are two mostly valid arguments
about C++: the language has too much undefined behavior (which
reduces the trust you can place in testing), and the language is
too complicated (which reduces the trust you can place in the
compiler).

Like all things, they have to be weighed against other aspects:
if you are choosing between Ada and C++, for example, the fact
that the C++ compiler has been more intensely used (and thus
more tested) may outweigh the additional complexity; the fact
that you have to write less code may mean increased trust
(supposing that you trust the compiler more than you trust your
programmers:)).
Basic one is log at point of failure. At least, if there is crash, it
is nice to be able to locate the bug/problem in order to fix it (or
get out of sue suite) without waiting for the next crash.

Most embedded systems don't have any support for logging. It's
not unusual, however, to "checkpoint" code, placing information
about recent operations in some sort of non-volatile fixed
length circular buffer. Whether you dare do even this once
you've found a software error, I don't know---most of the time,
I think it would be avoided in favor of terminating more
quickly.
 
A

Adam Skutt

I pulled this out from another thread as it seems to be a good topic. As
there was a bunch of hypothesizing going on about the design of
safety/life-critical systems in regards to how errors ("exceptions",
tomatoe/tomato) are handled. At least one person suggested that abort()
constitutes a fail-fast (seehttp://en.wikipedia.org/wiki/Fail-fast)
design, which seems completely wrong.

I don't see how it can be anything else and I'm not entirely sure what
the point of answering any of your other questions is until you
understand it. abort() is a way to fail fast. It may or may not be
the best way to do so, depending on a variety of factors, but it
certainly is a way to fail fast. It would be unusual for it to
constitute the entirety of a fail-fast system, but that's true of all
built-in language faculties: exceptions, assert, etc. Besides, you
don't build life-critical systems just out of software.
Is C++ used in life-critical systems? Expound please.

Yes, but I'm not sure exactly what you want anyone to expound on.
Tons of languages are used in life-critical systems. Why do you
care? It's entirely uninteresting.
Recognizing that higher-level supervision (by other systems) is surely a
common design in critical systems, thatwithstanding, how does any one
specific program handle detected bugs (the program itself detected a bug)
at runtime in released software?
Depends on what you're doing. You're getting a variety of answers
(but very little hypothesizing, in reality, there's a ton of material
on how this is done past/present/future available from ACM, IEEE,
freely online, etc.) because there are different solutions to vastly
different life/mission-critical problems.

In general, most solutions for an actual failure are a variant of,
"fail-fast and go to a redundant spare". Some systems will reboot the
failed node to a known state and try to get it to rejoin the
redundancy group, others intentionally leave it failed until it can be
inspected by a human being. Some might do both based on defined
rules. Some might simply shut down the system in question entirely.
The right thing to do depends entirely on what problem you're trying
to solve; it's not a question that can be answered in the detail you
seek without more information. The term "fail-fast" itself may have
different meanings too: in some cases you may get detailed logs, in
some cases you may get nothing (e.g., reboot due to failure to vote on
time, reboot by watchdog timer). Many life-critical systems therefore
log everything all of the time, preserving history to whatever degree
is tractable, to deal with this issue. Again though, what exactly is
done depends on the problem you're trying to solve: a piece of factory
equipment isn't a dam which isn't a airplane which isn't a satellite
which isn't the controller on your home furnace.

Adam
 
M

Man-wai Chang

This reason seems to be missing in the FAA "Object Oriented Technology

Depending on the processing power of the hardware...

--
@~@ Might, Courage, Vision, SINCERITY.
/ v \ Simplicity is Beauty! May the Force and Farce be with you!
/( _ )\ (x86_64 Ubuntu 9.10) Linux 2.6.36.2
^ ^ 22:09:01 up 2 days 8:25 3 users load average: 1.07 1.07 1.01
ä¸å€Ÿè²¸! ä¸è©é¨™! ä¸æ´äº¤! ä¸æ‰“交! ä¸æ‰“劫! ä¸è‡ªæ®º! è«‹è€ƒæ…®ç¶œæ´ (CSSA):
http://www.swd.gov.hk/tc/index/site_pubsvc/page_socsecu/sub_addressesa
 
I

Ian Collins

Life-critical systems are mostly real-time systems.

My bet to your question is NO, because of the memory needed and latency
in any OO languages! :)

Well it's a good thing C++ isn't just an OO language, isn't it?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,817
Latest member
DicWeils

Latest Threads

Top