Hi Malcolm,
There's no good answer.
The first thing to do is to divide your code into "pure functions" and
"procedures". A pure function shuffles bits, a prodeucre shuffles bits
and does IO.
So, your classification treats a parser as a "pure function" -- if
the I/O has been handled outside of the scope of the parser.
A pure function can only fail in three circumstances, it's called with
invalid parameters, there's an internal error in its coding, or it
runs out of memory.
In my case, a "process" (being imprecise in my terms) can also
*lose* a resource (e.g., memory) that it previously possessed.
But, its failure to act properly on the notification of that loss
would fall under the "bug" category (though there are some
degenerate cases where a resource may be withdrawn before the
process can act on the notification -- but the OS can deal with
that special case)
A procedure can fail in these circumstances, and also because of a
hardware problem, because the user provides erroneous or even
malicious data, because hardware is functioning but is overwhelmed by
the demands of the prodecure, or because of missing resources such as
non-existent files.
How are you differentiating the parser instance from the "malicious
data" instance? What if the data to be parsed originally came from
user I/O? What if it came from a corrupted data store? (is that
a "hardware failure"?)
So let's take the situations one by one. If you have invalid
parameters, that normally indicates a programming error in calling
code. The rare exception is when caller can't reasonably be expected
to check the parameters for validity - e.g. a statistical procedure
might fail when the numbers have a distribution that becomes less like
a bell curve when you take means of a sample. Caller can't reasonably
be expected to check for that condition. If caller can't be expected
to check the parameters, an "abormal" result must be passed back as
part of the normal flow control of the program. If caller can, there's
not much point in shuting the error back up to a buggy caller. You
need to abort the program with an error message if it can be aborted,
suppress the error if abortion isn't an option.
Parsing data provided (as an "input" to your pure function)
could nominally *expect* to find problems in that data through
no fault of the caller or the function operating on it. Yet,
the intent of moving the parsing into this "pure function"
is so the caller need not be aware of the requirements the
parser imposes on the data.
I.e., this looks like "bad data/parameters" but isn't a failure
of the code from either party. Since the parser didn't do the
I/O that originated the bad data, ...
You may not be able to communicate with the originator of that
data (e.g., if you read it from a file ... or, a const array
embedded somewhere in your image). And, you may not want to abort
the "program" -- just that *aspect* of the processing.
Internal error - difficult. Your own code is buggy. There's no real
answer to this situation.
It depends on the nature of your error (as do hardware errors!).
Your code might still be able to reliably report an error. It
just might not make sense to you -- until you examine your
code and see why your code is "deliberately" making that mistake.
I.e., if the results aren't safety critical, you make a best
effort and hope for the best. (I have some systems where
"/* CAN'T HAPPEN */" really should NEVER happen and, as a
result, you want to lock up the processor *hard* so that
it doesn't do anything potentially dangerous or lossy)
Out of memory - if it can reasonably be expected that the function
will run out of memory, shunt up an "out of memory" condition to
caller. If basically this can't happen (you need to store one filename
dynmaically ona amchine with 4GB of memory), abort with an out of
memory message.
But the user doesn't want to see "out of memory". The user doesn;t
need to *know* what's going on inside the code. The memory error
needs to be reinterpreted in the context of higher levels in the
application and expressed in a way that is appropriate to that
application and user base. This allows lower level libraries to
be reused between applications and *within* an application.
Running out of memory when trying to add a name to an address
book should yield a different message than when trying to print
a banner. The role of the memory allocation -- along with its
persistence and remedies -- varies based on that context.
I.e., "Your address book is full. Delete one or more entries
of you want to add this new contact" vs. "Image too large to
print. Try a lower resolution or size." Putting the "out of
memory" error into context adds value to the user (and thus
the product).
Hardware problems: is in once in a blue moon, or can it reasonably be
expected? If it's once in a blue moon, simply report an IO error and,
usually, terminate. If it's expected, you will need to know how to
code round the expected harware failures.
Bad user data - usually you should assume that the user is a hacker
trying to make your program malfunction. Should be normal control path
of the program, and reported up to caller.
[Again, how does this tie in with the parser "pure function"?]
IMO, you want to provide as much information to the user to enable
them to correct their entry (even if it is a malicious user). This
is where exposing internal "errors" from a lower level function
(e.g., parser) can benefit the user without burdening the application
or tying low level routines to a particular application.
For example, complaining that an integer value contained a decimal
point can be reported to the user regardless of whether the upper
layer that invoked the parsing routine was using it to fetch an
*age*, IQ, ZIP code, etc. The context associated with the upper
layer gives the "problem report" ("error" is a hard word to
avoid) some meaning; further qualification obtained from the
report of the lower level parser helps to clarify that:
Upper context: "The age that you entered is invalid."
Lower context: "The value must be a whole number -- it can not
contain a decimal point"
Overhelmed hardware - very difficult. If it will take a whole day to
write results to disk, is this acceptable or must be abort? It's often
not easy to answert these questions, or anticipate them. The file
might be quite small, the disk hardware very busy.
Missing resources - happens all the time. Treat as normal flow control
of program.
So generally the strategy is to treat so-called error conditions as
normal flow control, and pass the error condition up to caller. So
they're not really errors at all. The exception is errors caused by
programming mistakes. It's dangerous to pass an error back up to a
buggy caller. Normally you want to assert fail, which gives the
compiler the chice between reporting and aborting, or siliently
suppressing the error.
If the assertion is going to just terminate the program, then
the user has learned nothing. He reruns the program and it
dies, again.
Centralised error systems mean that code will break if you try to move
it to a different project.
I don't see that as a consequence of this sort of approach.
My scheme pieces errors (and descriptions) together as they
are encountered in the executable. Out_Of_Memory might resolve
to 0x2345678 today, 0x107 tomorrow and NEVER HAPPEN some time
next week. (i.e., grep the sources and you never FIND that
string in it!)
Furthermore, *your* instance of the sources might have a
different set of errors (and, thus, codes) than *my* image
based on the types of errors that our respective assignments
require us to detect/encounter.
And, when our images are merged, we magically end up with the
union of error messages and "codes" that we have individually
used.
As I said, I already have a version of this working -- borrowing
algorithms from my IDL compiler (same issues -- automatically
managing "message types"). But, it is hard to tell make(1)
when the error codes might have changed. So, any change to
a source file requires rebuilding the error code catalog.
The only way I can currently get make to cooperate is if I
introduce an intermediate file that I only touch(1) if the
error codes need to be updated. Then, put a separate dependency
on that file that triggers the building of the error codes
for other files. As projects get bigger, this approach doesn't
scale well :< (or, it coerces you into doing things in
less than optimal ways *just* to workaround the tools)
[I also have a problem using __FILE__, __func__ and __LINE__ as
components of unique identifiers as I might have 20 files named
get.c. Each with a function called get() within. And the
structure of each get() highly resembling the other 19 get()'s
(i.e., signaling errors on the same __LINE__s though physically
different files). And, I place strong restrictions on what can
be done with these "error codes" -- they aren't "regular numbers"]
If you try to hard-code error "codes", then you ARE stuck with a
system that only works for *you* and *your* application. Not a
very smart move, IMO.
Barring some other clever approach, I will just write a utility to
build the error codes and add it into the build process -- using
the IDL compiler as a rough framework (same sorts of issues). I
don't want to head down that path until I know that's the *best*
option!