Yes, I agree with that in general. Correctness and productivity are more
important, as a rule, and should be given priority.
I'm glad we agree on that, but I wonder why you previously emphasised
machine efficiency so much, and correctness almost not at all, in your
previous post?
It does imply that incorrect use of exceptions incurs an unnecessary
performance penalty, no more, no less, just as incorrect use of wrappers
incurs an unnecessary performance penalty.
If all you're argument is that we shouldn't write crappy APIs, then I
agree with you completely. The .NET example you gave previously is a good
example of an API that is simply poor: using exceptions isn't a panacea
that magically makes code better. So I can't disagree that using
exceptions badly incurs an unnecessary performance penalty, but it also
incurs an unnecessary penalty against correctness and programmer
productivity.
What this really comes down to is how frequently or infrequently a
particular condition arises before that condition should be considered
an exceptional condition rather than a normal one. It also relates to
how the set of conditions partitions into "normal" conditions and
"abnormal" conditions. The difficulty for the API designer is to make
these choices correctly.
The first case is impossible for the API designer to predict, although
she may be able to make some educated estimates based on experience. For
instance I know that when I search a string for a substring, "on average"
I expect to find the substring present more often than not. I've put "on
average" in scare-quotes because it's not a statistical average at all,
but a human expectation -- a prejudice in fact. I *expect* to have
searching succeed more often than fail, not because I actually know how
many searches succeed and fail, but because I think of searching for an
item to "naturally" find the item. But if I actually profiled my code in
use on real data, who knows what ratio of success/failure I would find?
In the second case, the decision of what counts as "ordinary" and what
counts as "exceptional" should, in general, be rather obvious. (That's
not to discount the possibility of unobvious cases, but that's probably a
case that the function is too complex and tries to do too much.) Take the
simplest description of what the function is supposed to do: (e.g. "find
the offset of a substring in a source string"). That's the ordinary case,
and should be returned. Is there anything else that the function may do?
(E.g. fail to find the substring because it isn't there.) Then that's an
exceptional case.
(There may be other exceptional cases, which is another reason to prefer
exceptions to magic return values. In general, it's much easier to deal
with multiple exception types than it is to test for multiple magic
return values. Consider a function that returns a pointer. You can return
null to indicate an error. What if you want to distinguish between two
different error states? What about ten error states?)
I argue that as designers, we should default to raising an exception and
only choose otherwise if there is a good reason not to. As we agreed
earlier, exceptions (in general) are better for correctness and
productivity, which in turn are (in general) more important than machine
efficiency. The implication of this is that in general, we should prefer
exceptions, and only avoid them when necessary. Your argument seems to be
that we should avoid exceptions by default, and only use them if
unavoidable. I think that is backwards.
I disagree here, to the extent that, whether something is an error or
not can very much depend on the circumstances in which the API is used.
That's certainly true: a missing key (for example) may be an error, or a
present key may be an error, or neither may be an error, just different
branches of an algorithm. That's an application-specific decision. But I
don't see how that relates to my claim that magic return values are less
robust and usable than exceptions. Whether it is an error or not, it
still needs to be handled. If the caller neglects to handle the special
case, an exception-based strategy will almost certainly lead to the
application halting (hopefully leading to a harmless bug report rather
than the crash of a billion-dollar space probe), but a magic return value
will very often lead to the application silently generating invalid
results.
[...]
Wanting to ignore a return value from a function is perfectly normal and
legitimate in many cases.
I wouldn't say that's normal. If you don't care about the function's
result, why are you calling it? For the side-effects? In languages that
support procedures, such mutator functions should be written as
procedures that don't return anything. For languages that don't, like
Python, they should be written as de-facto procedures, always return
None, and allow the user to pretend that nothing was returned.
That is to say, ignoring the return value is acceptable as a work-around
for the lack of true procedures. But even there, procedures necessarily
operate by side-effect, and side-effects should be avoided as much as
possible. So I would say, ideally, wanting to ignore the return value
should be exceptionally rare.
However, if a function throws instead of
returning a value, ignoring that value becomes more difficult for the
caller and can extract a performance penalty that may be unacceptable to
the caller.
There's that premature micro-optimization again.
The problem really is that, at the time the API is designed,
there often is no way to tell whether this will actually be the case; in
turn, no matter whether I choose to throw an exception or return an
error code, it will be wrong for some people some of the time.
I've been wondering when you would reach the conclusion that an API
should offer both forms. For example, Python offers both key-lookup that
raises exceptions (dict[key]) and key-lookup that doesn't (dict.get(key)).
The danger of this is that it complicates the API, leads to a more
complex implementation, and may result in duplicated code (if the two
functions have independent implementations). But if you don't duplicate
the code, then the assumed performance benefit of magic return values
over exceptions might very well be completely negated:
def get(self, key):
# This is not the real Python dict.get implementation!
# This is merely an illustration of how it *could* be.
try:
return self[key]
except KeyError:
return None
This just emphasises the importance of not optimising code by assumption.
If you haven't *measured* the speed of a function you don't know whether
it will be faster or slower than catching an exception.
You will note that the above has nothing to do with the API, but is
entirely an implementation decision. This to me demonstrates that the
question of machine efficiency is irrelevant to API design.
I agree with the concern about premature optimisation. However, I don't
agree with a blanket statement that special return values always and
unconditionally lead to more defects.
I can't say that they *always* lead to more defects, since that also
depends on the competence of the caller, but I will say that as a general
principle, they should be *expected* to lead to more defects.
Returning to the .NET non-
blocking I/O example, the fact that the API throws an exception when it
shouldn't very much complicates the code and introduces a lot of extra
control logic that is much more likely to be wrong than a simple
if-then-else statement. As I said, throwing an exception when none
should be thrown can be just as harmful as the opposite case.
In this case, it's worse than that -- they use a special return value
when there should be an exception, and an exception when there should be
an ordinary, non-special value (an empty string, if I recall correctly).
Exactly. To me, that implies that making something an exception that, to
the caller, shouldn't be is just as inconvenient as the other way
around.
Well, obviously I agree that you should only make things be an exception
if they actually should be an exception. I don't quite see where the
implication is -- I find myself in the curious position of agreeing with
your conclusion while questioning your reasoning, as if you had said
something like:
All cats have four legs, therefore cats are mammals.
Yes, in some cases it is. For example:
int numBytes;
int fd = open(...);
while ((numBytes = read(fd, …)) > 0) {
// process data...
}
Would you prefer to see EOF indicated by an exception rather than a zero
return value? I wouldn't.
Why not? Assuming this is a blocking read, once you hit EOF you will
never recover from it. Is this about the micro-optimisation again? Disc
IO is almost certainly a thousand times slower than any exception you
could catch here.
In Python, we *do* use exceptions for file reads. An explicit read
returns an empty string, and we might write:
f = open(filename)
while 1:
block = f.read(buffersize)
if not block:
f.close()
break
process(block)
This would arguably be easier to write and read, and demonstrates the
intent of the while loop better:
f = open(filename)
try:
while 1:
process(f.read(buffersize))
except EOFError:
f.close()
(But the above doesn't work, because an explicit read doesn't raise an
exception.)
However, there's another idiom for reading a file which does use an
exception: line-by-line reading.
f = open(filename)
for line in f:
process(line)
f.close()
Because iterating over the file generates a StopIteration when EOF is
reached, the for loop automatically breaks. If you wanted to handle that
by hand, something like this should work (but is unnecessary, because
Python already does it for you):
f = open(filename)
try:
while 1:
process(f.next())
except StopIteration:
f.close()
[...]
The core problem isn't whether exceptions are good or bad in a
particular case, but that most APIs make this an either-or choice. For
example, if I had an API that allowed me to choose at run time whether
an exception will be thrown for a particular condition, I could adapt
that API to my needs, instead of being stuck with whatever the designer
came up with.
There are many ways this could be done. For example, I could have a
find() operation on a collection that throws if a value isn't found, and
I could have findNoThrow() if I want a sentinel value returned. Or, the
API could offer a callback hook that decides at run time whether to
throw or not. (There are many other possible ways to do this, such as
setting the behaviour at construction time, or by having different
collection types with different behaviours.)
The point is that a more flexible API is likely to be more useful than
one that sets a single exception policy for everyone.
This has costs of its own. The costs of developer education -- learning
about, memorising, and deciding between such multiple APIs does not come
for free. The costs of developing and maintaining the multiple functions.
The risks of duplicated code in the implementation. The cost of writing
documentation. A bloated API is not free of costs.
In the context of my example, they are not. The range of behaviours
naturally falls into these categories:
* No data ready
* Data ready
* EOF
* Socket error
Right -- that fourth example is one of the NATURAL categories that any
half-way decent developer needs to be aware of. When you say something
isn't natural, and then immediately contradict yourself, that's a sign
you need to think about what you really mean
The first three cases are the "normal" ones; they operate on the same
program state and they are completely expected: while reading a message
off the wire, the program will almost certainly encounter the first two
conditions and, if there is no error, it will always encounter the EOF
condition.
I would call these the ordinary cases, as opposed to the exceptional
cases.
The fourth case is the unexpected one, in the sense that this
case will often not arise at all.
But it is still expected -- you have to expect that you might get a
socket error, and code accordingly.
That's not to say that lost connections aren't routine; they are.
Right -- we actually agree on this, we just disagree on the terminology.
I believe that talking about "normal" and "errors" is misleading. Better
is to talk about "ordinary" and "exceptional".
But, when a connection is lost,
the program has to do different things and operate on different state
than when the connection stays up. This strongly suggests that the first
three conditions should be dealt with by return values and/or out
parameters, and the fourth condition should be dealt with as an
exception.
Agreed.