You are telling there is no application ever written that does not
crash from time to time ?
No, I don't. However, it is quite long since I had random crashes in
my own programs (except for some MT errors...): I either had blatant
errors which just crashed the program as soon as the code was executed
or the applications just run.
Just humour me a minute. Theoretically, how would you go about finding
out what is causing a random crash ?
I actually already said how I would go about finding random crashes:
run the program with Purify. OK, that was before I checked the prices
(something like EUR 3000 and above...): this does not necessarily make
it a viable tool in all cases :-( However, it is definitely worth its
price for professional software development using C or C++: I'm using
this tool to verify the correctness of my programs (and it very rarely
detects anything) and to correct errors in programs, including ones
I haven't written. If you are not used to using Purify, it almost
certainl will highlight questionable program segments. Some of these
are actually false positives but these are easily suppressed (typical
suppression files I have include one or two dozens entries, often
duplicates due to different software versions). In general, Purify
only shows problems. You can get an evaluation version from IBM's site.
Assuming that Purify is no option due to its steep price, there are
a few other things you may be able to do:
- For the off-chance that the random crashes are actually a genuine
error caused at the crash site, you should load a core into your
debugger and analyse the code section for problems. If the random
errors are indeed random, this will not have much effect, however.
- Compilers are relatively smart about what a programmer should better
not do. Thus, you should compile your program with highest warning
level and act on each warning! The action may be a comment in the
code ("compiler warns about ... but this is actually OK") after
verifying that the code indeed does with it is supposed to do. Like
with Purify, compilers also give false positives and may force you
into a slightly different programming style to make the warnings go
away but often they are right about their complaints.
- Repeat the previous step with a different compiler! Actually, trying
to compile and run your code with a different compiler may provide
you with unexpected behavior already. Also, different compilers
enforce different rules, i.e. you should even expect compiler errors
when using a different compiler which also highlight problematic
section - at least often. Sometimes they are also due to inferior
compliance to the C++ standard, i.e. you probably want to use are
rather compliant compiler (e.g. Comeau C++ for just $50 is an
option).
This is mostly about static analysis and there are even programs
which specialize in static analysis e.g. from Gimpel (PCLint) and
Programming Research (QAC++). However, these tools also have a
rather non-negligible price.
- Search for uses of "known" dangerous functions, e.g. 'gets()',
'sprintf()', 'operator>>(char*)', etc. Where possibly, replace
them with safe alternatives (e.g. 'fgets()', 'snprintf()',
'operator>>(std::string)') and in all other places verify that
they cannot cause any problems, e.g. because the input is already
known and verified.
OK, this far the stuff is rather non-intrusive: the application is
not changed in any form specifically for testing (handling warnings
and errors may incur changes but these will stay in the code). If
this did not yield sufficient results or if you want to do further
verification, you would create a special test version of your
application:
- Some libraries, including standard libraries, come with debugging
versions or you can obtain some debugging version. For example,
you can using STLPort in debugging mode as a replacement for [most
of] the standard C++ library. In debugging mode this library will
test many things which a release version version does not test,
e.g. out of bound access in 'std::vector'. Run your application
after compiling it with the most aggressive debugging options
turned on.
- Some compilers have switches which will result in the memory of
released variables being overwritten by some fixed bit pattern.
Often you can replace the memory management functions ('malloc()',
'free()', 'operator new()', etc.) to do something similar. This
detects uses of stale objects.
- Again replacing the memory management functions, you can arrange
for memory blocks being located at page boundaries where the
adjacent page is access protected - at least if you are not
using lots of memory. efence is a library which does just that.
If nothing of the above helps, its time to purchase Purify... If
this is not an option, it is time to rework the whole application to
use modern C++ techniques, e.g. containers, smart pointers, etc.
These make some problem detection viable through libraries like
STLport. Of course, this get rather involved. If you are going down
this route, here are more things you might want to consider:
- Single step through the program and verify that the "right" thing
is happening. It is sometimes quite surprising that the program
does other things than you might expect.
- Do a thorough code review.
The important thing to note is that I'm not really making fun of you
if I state that you should have programmed your application
differently from ground up: some techniques lend to better error
discovery than others. Following certain styles, using certain
libraries, etc. helps with program correctness. It actually helps
that much that even in C++ "typical" programming errors like the
"random crash" can be prevented most of the time. The areas where
this can not [yet] be done seem to lack appropriate approaches, e.g.
multi-threading.