Musings, alternatives to multiple return, named breaks?

L

Les Cargill

Ian said:
Eh? I've never known it not to work.


That is why the frequency is legendary :)

It may be the case that everybody tests for that well now
and it's another one of those things that doesn't happen
any more.

I am thinking that this was more a problem with... bronze-age
toolsets which involved a bonded-out CPU as a component of
the sort of ICE you would plug in where the processor used to be.

Then again, I saw this once with a modern JTAG "emulator". It
was the optimizer of course - notched back the -O level
for debug.
 
G

glen herrmannsfeldt

Les Cargill said:
James Kuyper wrote:
(snip)
The point of early return/break is to make explicit invariants that if
violated, preclude the main thing the function is for.

Or not.

If you search a linked list or tree, the early return might be when
you found the node you were looking for, the end of the function
return when you didn't.

Some years ago, (days of the Motorola 6809) there was BASIC09
for the OS9 operating system, which has something like BREAK, but
you can specify some statements to be execute on the way out.

-- glen
 
L

Les Cargill

glen said:
Or not.

If you search a linked list or tree, the early return might be when
you found the node you were looking for, the end of the function
return when you didn't.


Of course.

For me, the glass is half empty...

The idea is to disentangle all the testing for ugliness up to where
you can scroll past it in your editor and see the meat of the thing in
all its glory.

So long as the formatting of the code supports the basic narrative
of that which it is trying to do, then it is all good.
 
K

Keith Thompson

Eric Sosman said:
I wasn't misled, and I understood where your control transfer
would go (and it's the same place Java's goes). My dislike of the
construct is that `break label;' takes you to a point that may be
distant from where 'label: while(c)' is, to a point that is *not*
labelled or noted or marked or distinguished in any visible way.

Much like a return statement, yes?

The behavior of a return statement is defined relative to a named entity
(a function definition) in whose scope it appears. The same would be
true of a named break.

Just because it's implemented the same way as a goto doesn't mean you
should think of it as a goto.

[...]
 
K

Kaz Kylheku

int f()
{ int fail = 0;
if( fail = do_something_1() );
else if( fail = do_something_2() );
else { ... }
return fail; }

, and this will also exit immediately when »do_something_1()«
is nonzero.

Ah, you have Greenspunned a blub version of the Lisp operator or:

(or (do-something-1) ;; if this is true (not nil) return it
(do-something-2) ;; otherwise if this true, return it
(do-else)) ;; otherwise return this

If Ritchie had been paying closer attention to Lisp, he'd have made the
obviously Lisp-inspired || operator return the value of the left operand
when it is true, otherwise the value of the right operand, making it possible
to do:

int f(void)
{
return do_something_1() || do_something_2() || do_something_else();
}

Since C is statically typed, the return type foir B || C cold be worked
out in exactly the same manner as in A ? B : C.

Speaking of which, in the TXR project, I have some or macros built on the
ternary operator to do this (since || is useless):

val f(void)
{
uses_or2; /* when we use or2, or3 or or4, we must declare this */

return or3(do_something_1(),
do_something_2(),
do_something_else());
}

The constructs need a temporary variable, hence the "uses_or2;"
declaration which abstracts that away a little bit.

#define uses_or2 val or2_temp

#define or2(a, b) ((or2_temp = (a)) ? or2_temp : (b))

#define or3(a, b, c) or2(a, or2(b, c))

#define or4(a, b, c, d) or2(a, or3(b, c, d))
 
B

BartC

Keith Thompson said:
I don't mind using a goto to jump to error-handling code at the end of a
function, but I really don't think wrapping "goto end;" in a "break_end"
macro serves any good purpose. Just drop the macro definition and write
"goto end;".

You have to imagine there was a new statement such as 'break_end' or
'finish' which is simply that: transfer control to the end of a function (or
at least just before any explicit return).

'goto end' might also work, but you'd have less confidence that that 'end'
is where it should be and not in any of a hundred other places. It also
doesn't look good. The 'break_end' or whatever it might at least makes an
attempt to show that this is a more structured use of 'goto'.
 
B

BartC

I haven't done much with Java, but I've used languages that have a
similar feature: loops can be labeled, and a break or continue (or
equivalent) can refer to the label.

The idea is that the label isn't just a location to which you can branch
(you might as well use a goto for that); it's the *name of the loop*.

For example, using a C-like syntax:

PROCESS_ROWS:
for (row = 0; row < MAX_ROW; row ++) {
PROCESS_COLUMNS:
for (col = 0; col < MAX_COL; col ++) {
if (done_processing_columns) {
break PROCESS_COLUMNS;
}
if (done_with_all_rows) {
break PROCESS_ROWS;
}
}
}

With the syntax I've presented, a loop name has the same syntax
as a goto label, which could cause confusion (though Perl does the
same thing and I haven't known it to be a problem, perhaps because
gotos are rare in Perl). Ada uses distinct syntax for loop names
(and block names) vs. goto labels.

On the other hand, in Perl, the "break" and "continue" statements
are spelled "last" and "next", and the label name tends to refer
to what's processed by one iteration of the loop. In the example
above, the outer loop would probably be called "ROW", and the "break
PROCESS_ROWS;" would be "last ROW;"; skipping to the next row is
written "next ROW;". Not that changing C's keywords is an option,
of course.

(In the syntax of my languages, I use 'exit' for break, and 'next' for
continue, if the latter does what I think it does (skip to the next
iteration).

I also have 'redo' (repeat this iteration) and 'restart', mainly for use in
'for' loops.

I don't use labels for loops; if I need to refer to an outer loop, I use an
index. But this is rare anyway; in a 100kloc body of code, I think I once
counted six such uses. And I would guess most of those referred to the
outermost loop. So probably it is only necessary to worry about the
innermost loop, and the outer one, in which case labelling is not so
important.

Which brings up another point: a labelled loop is an absolute reference,
while an index would be a relative one. In both cases, inserting an extra,
intermediate loop can require the label or index to be updated. But perhaps
less so with labelled loops.)
If this were to be added to C, you'd have to define exactly how a
name is associated with a loop.

If a label is desired, then a for-loop already has a ready-made one in
the form of the loop variable (eg. NEXT I in Basic). (I know that C
for-loops are probably too chaotic for this as they way have 0, 1 or N loop
variables. And there are do and while loops as well.)
I suppose it could just be defined
so that if a labeled statement is a loop, then a break or continue
can use the label name. And you could do the same thing for labeled
switch statements.

I think, first fix the problem where a break inside a switch statement
shadows the break statement you might need to exit an outer loop.
 
D

David Brown

The above could also be written as

int root( int const x ){ return x < 0 ? 0 : root_nonnegative( x ); }

Yes, that is of course true. Or it could be written in an in-between style:

int root(int x) {
if (x < 0) {
return 0;
} else {
return root_nonnegative(x);
}
}

As long as root_nonnegative is static, the resulting object code will be
pretty much identical (sometimes these things are relevant).

There is no "best" answer, it's a question of style and preference, and
whatever works best at the time. My personal preference is that
early-exit returns are often preferable to factoring out a new function,
especially if you happen to need lots of parameters. And two returns at
equal branches is preferable to using the conditional operator. (That's
personal bias - I just don't like the conditional operator.)
 
D

David Brown

I've got to admit that I'm a big fan of multiple returns when
it comes to error handling. A lot of stuff I write starts with
checking the function arguments, and if one if them isn't kosher
I return immediately with an error value. And even later, when
something doesn't add up and there's nothing within the function
that can be done I have no qualms returning at that point with
a return value to indicate that something went wrong.

Typical example is a function that's supposed to send some command
to a device. If the input to the function is goofy I bail out
immediately. If I find that the state of the device isn't com-
patible with what the function is supposed to do with the device
I bail out, immediately. If there's a problem that can't be fixed
with communicating with the device I bail out, immediately. If I
would follow the mantra of "only one return" such a function would
be a complete mess - too many layers of "if" and "else".

I like that style too.
Other languages have exceptions. I'd love to have them in C, to
be honest. And throwing an exception is, basically, a return on
steroids. Why have I never seen a similar criticism of exceptions
but lots of complaints about multiple returns as if they would be
something the devil invented to make live even more miserable?

I can criticise exceptions if you like :) They are like multiple
returns, except you often can't see when they happen, they pass through
layers of functions without your knowledge, they hinder optimisation
(because the compiler doesn't know when they can occur), and they make
the generated code much more difficult to follow.
My impression is that this "no multiple returns" was originally
about practices where functions did lots of different things that
shouldn't have been done in a single function and that this has
morphed into a mantra that isn't allowed to be questioned any-
more. Things like what the OP (in princip;e) proposed like

int fail = OK;
while ( 1 ) {
if ( fail = do_something_1( ) )
break;
if ( fail = do_something_2( ) )
break;
...
}
return fail

etc. isn't any different from multiple returns, just harder to
read. It follow the dogma of "no multiple returns" by the letters
but not by the spirit.

This is especially true if you have nested loops or switches - "return"
makes it clear that you are breaking out of everything, not just one of
the loops.
 
I

Ian Zimmerman

It's a bit of "loincloth" syntax.

In almost all nontrivial examples, there won't be a single "end"
location to jump to, because different error paths need to do different
cleanups. So one would have to define multiple loincloths anyway :p

--
Please *no* private copies of mailing list or newsgroup messages.

gpg public key: 2048R/984A8AE4
fingerprint: 7953 ADA1 0E8E AB57 FB79 FFD2 360A 88B2 984A 8AE4
Funny pic: http://bit.ly/ZNE2MX
 
K

Keith Thompson

BartC said:
You have to imagine there was a new statement such as 'break_end' or
'finish' which is simply that: transfer control to the end of a function (or
at least just before any explicit return).

The code after the label could include an arbitrary amount of cleanup
code, probably to deallocate resources -- and there could be multiple
target labels doing different kinds of cleanup. I don't think existing
uses of goto for cleanup could be replaced by your new statement. They
could be replaced by exception handling, which I wouldn't mind seeing,
but I don't think that's going to be added to C.
'goto end' might also work, but you'd have less confidence that that 'end'
is where it should be and not in any of a hundred other places. It also
doesn't look good. The 'break_end' or whatever it might at least makes an
attempt to show that this is a more structured use of 'goto'.

The label name "end" is a *very* strong hint that it's at or near the
end of the function. It's always possible to write bad code, say by
having "goto end;" actually branch near the beginning of the function,
but that's easily solved by firing the author and removing any record
that he ever existed. (Or perhaps something less drastic would
suffice.)

Using goto for error handling is a common existing idiom; I see it all
the time in the code I work on. Hiding it behind a macro would not be
helpful.
 
B

BartC

Keith Thompson said:
The code after the label could include an arbitrary amount of cleanup
code, probably to deallocate resources -- and there could be multiple
target labels doing different kinds of cleanup.

I don't think it's going to be a tidy-looking function anyway in that case.
And there could be four kinds of cleanup code, and dozens of returns; which
ever way you do it, it's going to be untidy (multiple returns, gotos to some
of those returns to avoid duplicating code, breaks to named blocks...). And
suppose in those four kinds of cleanup code, half the code is common between
them?
I don't think existing
uses of goto for cleanup could be replaced by your new statement.

It might encourage a more structured approach, such as having the four
cleanup kinds (in my example) all in roughly the same place; dealing with
them with a switch or if statement, perhaps making it easier to share common
elements.
They
could be replaced by exception handling, which I wouldn't mind seeing,
but I don't think that's going to be added to C.

That's part of it, when the returns are triggered by errors. But they could
also just be different kinds of return values, nothing to do with errors.
Using goto for error handling is a common existing idiom; I see it all
the time in the code I work on. Hiding it behind a macro would not be
helpful.

It looks poor. The OP is right to discuss alternatives (even if there is
little probability of the language actually changing in his lifetime).

I use goto reluctantly, mainly for sharing common lines of code in things
such as switch statements. I can implement some of this stuff (not directly
in C, but via source->source translators) given good ideas, but I've
struggled to find something better.

The multiple return problem is a little simpler though, since there will be
one goto target in a function, instead of N.

(And since I put it forward, I've now implemented the 'break_end' idea as an
actual statement in another language (it took 15 minutes for a first
version). I will let you know how useful it turns out to be. I suspect it
will need tweaking. Maybe, as you say, there will be a frequent need for
more than one common return point. Or maybe there will be little advantage,
after all, over just using 'goto'.)
 
L

Les Cargill

Richard said:
Anyone resorting to Kernighan in 2014 needs to re-evaluate.

Oh how I wish that were true. I've expected someone to come along and
save us from ourselves for decades now. Hasn't happened.
Its up
there with that nonsense about debugging being 10x harder than getting
the code right in the first place and therefore you shouldnt use a
debugger.

That is not what this means.

It depends on what you use a debugger *for*. As a diagnostic, great.
But there exists a class of defects that run fine in a debugger but
break in the field.
Really? In this day of million line libraries which can and do
have bugs and all sorts of side affects?

*Shudder*. Yep.
When Heisenbugs can and do kill
development cycles?

Now that is true - but I think it's why a certain humility is required.


I see Kernighans's ... koan as an exhortation to humility.
If you cant easily understand this then I would be surprised.

int func(char *p)
{
if(!p)
return ERROR

rest of code

return OTHER_RESULT;
}

Sure - that's an "order one" example. I have no problem with
*multiple* early returns.

It is just that error checking is the hard part, and
you need to have it be clear from reading the code - whatever
that means.


Obviously it needs some common sense and NO ONE is suggesting huge
monolithic functions.

Of course.
 
J

James Kuyper

Surely it's not "you shouldn't use a debugger".

No, Richard claims to believe that this is precisely what other people
have said. They didn't, but that hasn't stopped him from claiming it.
People have said that sometimes it's feasible (and even occasionally,
easy) to figure out what's wrong with a program without using a
debugger. A few people have even indicated that they, personally, have
seldom, if ever, used a debugger. Those both seem to me like they should
be uncontroversial claims, but some people have actually argued against
them, even without having misinterpreted them the way that Richard does
above.
 
M

mathog

Richard said:


In long complex functions there may be return statements buried here and
there in the middle, off the screen when either the top or bottom of the
function is examined. These I really don't like because it is just too
easy to not find all the buried return statements, and that leads to
wasted effort when debugging. The person who wrote the code often put
those buried returns in because it was seen as just too much work to
clean things up to get nicely to the bottom of the function.

Another reason concerns tracing code using print statements. (Sometimes
it comes down to that, especially with parallel code, or event driven
GUI code). I use this technique a lot when I need to do two runs under
different conditions and compare what the program did. (The log of such
a run might be 10K lines long.) Two key pieces of information are
"entering FunctionA" and "leaving FunctionA". Marking up the code to
allow this is trivial when there is only one exit point, but it can be a
pain when there are multiple returns.

In most situations there is nothing very wrong with the structure of a
function like this:

int function(void){
if(test1)return(1);
if(test2)return(2);
/* etc */
return(0);
}

For instance, it is easy to see how it works. However in this specific
debugging scenario, it is the worst possible case, since every return
needs to be individually instrumented with its own print statement.
Variants like this:

int function(void){
int status=0;
if(test1){ status = 1; goto end;}
if(test2) {status = 2; goto end;}
/* etc */
end:
return(status);
}

do exactly the same thing, but are no more difficult to instrument than
any other function.

Regards,

David Mathog
 
B

BartC

Two key pieces of information are "entering FunctionA" and "leaving
FunctionA". Marking up the code to allow this is trivial when there is
only one exit point, but it can be a pain when there are multiple returns.

In most situations there is nothing very wrong with the structure of a
function like this:

int function(void){
if(test1)return(1);
if(test2)return(2);
/* etc */
return(0);
}

For instance, it is easy to see how it works. However in this specific
debugging scenario, it is the worst possible case, since every return
needs to be individually instrumented with its own print statement.
Variants like this:

int function(void){
int status=0;
if(test1){ status = 1; goto end;}
if(test2) {status = 2; goto end;}
/* etc */
end:
return(status);
}

do exactly the same thing, but are no more difficult to instrument than
any other function.

Sometimes I might write temporarily wrap such a function like this:

(1) Rename function() as function2 (or any such temporary name)

(2) Create the wrapper function, which must have the same name as the
original:

int function(void){
int temp;
puts("Entering function");
temp=function2();
puts("Leaving function");
return temp;
}

However you were responding to someone who belongs strongly in only using
debuggers for this sort of thing.
 
I

Ian Collins

mathog said:
In long complex functions there may be return statements buried here and
there in the middle, off the screen when either the top or bottom of the
function is examined. These I really don't like because it is just too
easy to not find all the buried return statements, and that leads to
wasted effort when debugging. The person who wrote the code often put
those buried returns in because it was seen as just too much work to
clean things up to get nicely to the bottom of the function.

The problem here is not early returns, it is long complex functions.
While still an issue in legacy code, there is no need for these in
modern code.
Another reason concerns tracing code using print statements. (Sometimes
it comes down to that, especially with parallel code, or event driven
GUI code). I use this technique a lot when I need to do two runs under
different conditions and compare what the program did. (The log of such
a run might be 10K lines long.) Two key pieces of information are
"entering FunctionA" and "leaving FunctionA". Marking up the code to
allow this is trivial when there is only one exit point, but it can be a
pain when there are multiple returns.

Migrate to a system with dtrace and you'll never have to suffer function
tracing print statements!
 
G

glen herrmannsfeldt

(snip on multiple returns)
In long complex functions there may be return statements buried here and
there in the middle, off the screen when either the top or bottom of the
function is examined. These I really don't like because it is just too
easy to not find all the buried return statements, and that leads to
wasted effort when debugging. The person who wrote the code often put
those buried returns in because it was seen as just too much work to
clean things up to get nicely to the bottom of the function.

OK, one case is when there are some tests at the beginning.
In that case, you can put in some if statements, where one branch
or the other covers the whole rest (maybe 99%) of the function.
Not so bad if it is only one, but gets ugly if it is nested two or
three levels. If commented right, ones at the beginning shouldn't
be hard to see.
Another reason concerns tracing code using print statements.
(Sometimes it comes down to that, especially with parallel code,
or event driven GUI code). I use this technique a lot when I need
to do two runs under different conditions and compare what the
program did. (The log of such a run might be 10K lines long.)

In the case of nested loops, well, often enough you don't want
to return, but do still need to get out of nested loops.

My usual one, at least in the case of a for() loop, is to break
and then just after the loop test the (hopefully simple) loop
condition.

for(i=0;i<10;i++) {
(various statements) break;
}
if(i<10) break;

(and inside at least one other loop)

First, unlike just about everyone else, I indent the loop closing
brace along with the loop statement. Note, then, as you look down
the page the next non-indented statement is the if. I might do this
extra if() test for two or three loops deep, and, again much of the
time return doesn't help much.
Two key pieces of information are "entering FunctionA" and
"leaving FunctionA". Marking up the code to allow this is
trivial when there is only one exit point, but it can be a
pain when there are multiple returns.

Reasonably often, though, you want to print out something different
in the two cases.
In most situations there is nothing very wrong with the structure of a
function like this:
int function(void){
if(test1)return(1);
if(test2)return(2);
/* etc */
return(0);
}
For instance, it is easy to see how it works. However in this specific
debugging scenario, it is the worst possible case, since every return
needs to be individually instrumented with its own print statement.
Variants like this:
int function(void){
int status=0;
if(test1){ status = 1; goto end;}
if(test2) {status = 2; goto end;}
/* etc */
end:
return(status);
}

Usually I would do:

status=1;
if(test1) goto end;
status=2;
if(test2) goto end;

You can also do:

if(test1) {
status=1;
}
else if(test2) {
status=2;
else {
(most of the rest of the function)
status=0;
}
return;

or even, with a little luck:

if(test1) {
status=1;
}
else if(test2) {
status=2;
else for(i=0;i<10;i++) {
(most of the rest of the function)
status=0;
}
return;

Such that there is no additional nesting level for the rest
of the function, as it is already inside a loop.

(I assume that there is at least one more statement in the
status=1 and status=2 cases.)
do exactly the same thing, but are no more difficult to
instrument than any other function.

I have done timing tests, where I need some statements at the
beginning and end, which have the same problem.

-- glen
 
G

glen herrmannsfeldt

Ian Collins said:
mathog wrote:
(snip)
(snip)

The problem here is not early returns, it is long complex functions.
While still an issue in legacy code, there is no need for these in
modern code.

Somehow this implies that modern code is getting simpler.

There are still plenty of complicated algorithms that lead to large
complex functions. Yes, one should work toward more and smaller
functions were possible, but I know that there are still some in
computational physics, as one example, that are necessarily large.

In many cases, though, the only reason for a return in the middle
is because something went very wrong. At that point, it might not
matter what else happens.

-- glen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,075
Messages
2,570,554
Members
47,197
Latest member
NDTShavonn

Latest Threads

Top