Chris said:
Of course, this all has to be judged on a cost/benefit basis. It
is difficult to use the software to test itself if the CPU is not
even executing the boot instructions stored in the boot ROM, for
instance.
We had a simple solution to that on one project, which required some HW.
There was a "BIT" (Built In Test) display which was designed to power up
showing a processor failure. The SW tested the processor then changed
the display it indicate a ROM failure. The SW then did a checksum on the
ROM and if that passed changed the display to show a RAM failure. It
then did a RAM test and if that passed changed it to show the base
system was OK. These initial tests were all written in assembler.
This could lead it to showing a processor failure instead of a ROM
failure, but the chances of getting an system OK display if anything was
faulty was minimal.
I often find it useful to check for "impossible" conditions in
low-level code, but I have to trade that against the lack (at that
point) of a strategy for dealing with such conditions, and the
slowdown effect of performing an "unnecessary" test.
It is, indeed, all a matter of balance. In the case of the test
equipment I had most involvement with (by coincidence I also think it
the best done ;-) ) the SW was actually running on an HP workstation,
not an embedded system. We got the SW to do a basic check of the test
rig when started, and then ensured that all exceptions where handled.
When it came to writing the tests (which were all done at system level)
we even managed to trigger some of these traps for "impossible"
conditions by simulating conditions it was reasonable for the SW to come
across! There was no requirement for it to be fast, so if the exception
handling slowed things down it was not a problem. One other benefit,
when a customer phoned us up saying, "it's not working!" I was able to
identify on the basis of the error message that a specific card was
faulty, removed or, most likely, a switch on it had been moved. I was
assured that although they had been using the workstation for other
things there was no reason and no way this switch would have been
changed. Later on I was told that switch *had* been changed. An
"impossible" condition occurred, the HW problems was identified, and the
solution given, all from my desk with the kit several thousand miles
away and me not having to check through the code.
Only optimise out unnecessary tests if you have seen you have a
performance problem ;-)
I continue to endeavour to trap all the exceptional conditions even
impossible ones somewhere in SW that I write (maybe in a higher or lower
level function than where the problem occurs), and sometimes I am
surprised when one of those traps is triggered, so I consider it a good
investment in time.
I've also found that my customers are always happier of the SW comes up
with an error message that looks like it might mean something than if
the software crashes, continues "working" but produces erroneous
results, or comes up with a generic error such as "Segmentation
violation". Of course, the customer is even happier when everything
works perfectly ;-)
I've just about finished a two month project developing some significant
functionality for one of our pieces of server SW. This new functionality
makes heavy use of third-party libraries. I can across lots of things
where I though, oh, that can't possibly go wrong, but the traps are
still in there in the code the customer is testing and will remain in
the SW throughout it's life. If there is a bug in the third-party
library that makes one of these "impossible" conditions happen this
means I will be able to look at the logs and identify accurately where
the failure is. So will a maintenance programmer 10 years down the line,
because each trap will put a unique message in the log.
--
Flash Gordon
Living in interesting times.
Web site -
http://home.flash-gordon.me.uk/
comp.lang.c posting guidlines and intro -
http://clc-wiki.net/wiki/Intro_to_clc