Loops, switches, embedded returns

E

Ed Davis

The following is a code fragment from a simple parser.

The return is buried inside the switch statement (the actual
switch has several more cases). I can't come up with a good
alternative I like better. Any ideas? I've tried the 'while
(!done)' approach, but keeping track of the flag doesn't seem any
better than the embedded 'return'.

Symbol get_sym(void);
Symbol sym;

void stmt_seq(void) {
for (;;) {
switch (sym) {
case ident_sym : ident_stmt(); break;
case while_sym : while_stmt(); break;
case if_sym: if_stmt(); break;

case endwhile_sym:
case else_sym:
case endif_sym:
case eof_sym:
return;

default:
error("expecting stmt");
get_sym();
}
}
}
 
P

Peter Pichler

Ed Davis said:
The following is a code fragment from a simple parser.

The return is buried inside the switch statement (the actual
switch has several more cases). I can't come up with a good
alternative I like better. Any ideas? I've tried the 'while
(!done)' approach, but keeping track of the flag doesn't seem any
better than the embedded 'return'.

Symbol get_sym(void);
Symbol sym;

void stmt_seq(void) {
for (;;) {
switch (sym) {
case ident_sym : ident_stmt(); break;
case while_sym : while_stmt(); break;
case if_sym: if_stmt(); break;

case endwhile_sym:
case else_sym:
case endif_sym:
case eof_sym:
return;

default:
error("expecting stmt");
get_sym();
}
}
}

As usual, I looked at the code first and tried to figure out what's wrong
with it, and only when I still could not find anything, I read the question.
My answer would be that this is perfectly readable with obvious semantics.
Cluttering it with unnecessary flags would only obfuscate it. But, as they
say, beauty is in the eyes of beer holder, so I am sure you will get
hundreds of opposing replies. As always with style issues.

Peter
 
C

CBFalconer

Ed said:
The following is a code fragment from a simple parser.

The return is buried inside the switch statement (the actual
switch has several more cases). I can't come up with a good
alternative I like better. Any ideas? I've tried the 'while
(!done)' approach, but keeping track of the flag doesn't seem any
better than the embedded 'return'.

Symbol get_sym(void);
Symbol sym;

void stmt_seq(void) {
for (;;) {
switch (sym) {
case ident_sym : ident_stmt(); break;
case while_sym : while_stmt(); break;
case if_sym: if_stmt(); break;

case endwhile_sym:
case else_sym:
case endif_sym:
case eof_sym:
return;

default:
error("expecting stmt");
get_sym();
}
}
}

My version, insisting that each parser return its termination
symbol (one symbol lookahead):

Symbol get_sym(void);

Symbol stmt_seq(Symbol sym)
{
while (eof_sym != sym) {
switch (sym) {
case ident_sym :
sym = ident_stmt(sym); break;
case while_sym :
sym = while_stmt(sym); break;
case if_sym:
sym = if_stmt(sym); break;

case endwhile_sym:
case else_sym:
case endif_sym:
case eof_sym:
break;

default:
error("expecting stmt");
sym = get_sym();
} /* switch */
if (sym == eof_sym) break;
} /* while */
return sym;
} /* stmt_seq */

For your general query about error handling, google for the source
to PascalS, which is very clear. Pascal has the set primitive,
which you can simulate more or less in C. Now each construct is
qualified by the set of possible initial symbols I, and the set of
possible termination symbols T. You augment the set T by those
symbols that must not be ignored in scanning past an error. Then
the basic idea is something like:

do {
while (sym IN I) {
parse(sym); sym = nextsym();
}
} while (!(sym in T));

Most compilers today use parser generators, such as YACC or BISON,
rather than the simpler recursive descent. Recursive descent
usually has better error handling.

An enum of the possible syms will be helpful. If the enum is
arranged to describe them by single bits the set implementation is
trivial.

enum syms {EOsrc, WHILE, FOR, IF = 4, ELSE = 8, DO = 16 ....};

and you can keep the sets in a single unsigned long if you have no
more than 31 of them. Some can be grouped together, such as MULOP
(*, /), ADDOP (+. -), RELOP (<, >, <=, >=, ==, !=). This is part
of language design.
 
R

Rob Thorpe

The following is a code fragment from a simple parser.

The return is buried inside the switch statement (the actual
switch has several more cases). I can't come up with a good
alternative I like better. Any ideas? I've tried the 'while
(!done)' approach, but keeping track of the flag doesn't seem any
better than the embedded 'return'.

Symbol get_sym(void);
Symbol sym;

void stmt_seq(void) {
for (;;) {
switch (sym) {
case ident_sym : ident_stmt(); break;
case while_sym : while_stmt(); break;
case if_sym: if_stmt(); break;

case endwhile_sym:
case else_sym:
case endif_sym:
case eof_sym:
return;

default:
error("expecting stmt");
get_sym();
}
}
}


It is normally good practice to write functions with one unique exit
point. But for a function as short as this one it doesn't really
matter so much. The programmer reading it can probably see both exit
points on their screen at the same time, so it's not difficult to
understand.
 
O

Old Wolf

It is normally good practice to write functions with one unique exit

Why?

Can you suggest a better structure for a function like this:

/* returns 0 for success, or some other code for failure */
int baz(void)
{
if (bar1()) return ERR1;
foo1();
foo2();
if (bar2()) return ERR2;
foo3();
foo4();
if (bar3()) return ERR3;
return 0;
}

where, of course, my foo function calls are meant to represent
any amount of code, and bar function calls, some test condition.

If you use if..else then you end up with one more level of
indent each time and the code quickly moves off the right-hand
side of the screen.
 
P

Peter Pichler

Old Wolf said:
Why?

Can you suggest a better structure for a function like this:

/* returns 0 for success, or some other code for failure */
int baz(void)
{
if (bar1()) return ERR1;
foo1();
foo2();
if (bar2()) return ERR2;
foo3();
foo4();
if (bar3()) return ERR3;
return 0;
}

where, of course, my foo function calls are meant to represent
any amount of code, and bar function calls, some test condition.

If you use if..else then you end up with one more level of
indent each time and the code quickly moves off the right-hand
side of the screen.

Plus, of course, you need to introduce an extra variable that obfuscates the
code even more. "No multiple returns" is just another coding dogma, IMHO.
 
R

Rob Thorpe


Because a return is not structured, it's a goto by another name.
Can you suggest a better structure for a function like this:

/* returns 0 for success, or some other code for failure */
int baz(void)
{
if (bar1()) return ERR1;
foo1();
foo2();
if (bar2()) return ERR2;
foo3();
foo4();
if (bar3()) return ERR3;
return 0;
}

where, of course, my foo function calls are meant to represent
any amount of code, and bar function calls, some test condition.

Actually no I can't. This is how I would do it in this case, though
it's not really very good.

But, have you ever seen code where the returns are embedded in
multiple if and else pair? ie.

char *foo()
{
if (fred == jim)
{
... stuff ...
if (sheila < 9)
{
... stuff ...
}
else
{
... stuff ...
return ("turquiose");
}
}
.. tons of stuff ..
return("blue");
}

It's horrible, especially if there are >4 returns and if the code at
the bottom has lots of side effects. The problem is, when you're
reading code it's not normal to have to understand the upshot of every
nested if..else in or to be able to understand the function.
If you use if..else then you end up with one more level of
indent each time and the code quickly moves off the right-hand
side of the screen.

The problem with "only use one exit point" is that it's far too
general, there are many case where multiple exit points are useful,
but they can become horrendous. It's like many rules of structure
people give :"avoid global variables" or "don't use gotos", it tends
to be overstated.
 
R

Richard Heathfield

Old said:

Because every arbitrary change in the control flow makes a program harder
for a maintainer to understand. And that maintainer could be you, in three
years from now.

Consider the position of a maintainer who is trying to understand a program
(for the excellent reason that it has a bug) and comes across a "premature"
return statement. He now has to assure himself that the premature return
has not contributed to the problem (e.g. by failing to perform some
necessary cleanup).
Can you suggest a better structure for a function like this:

/* returns 0 for success, or some other code for failure */
int baz(void)
{
if (bar1()) return ERR1;
foo1();
foo2();
if (bar2()) return ERR2;
foo3();
foo4();
if (bar3()) return ERR3;
return 0;
}

I would actually use the following code (which relies on bar1(), bar2(), and
bar3() to return the appropriate values (note that ERRn is not an
appropriate value, since it invades implementation namespace):

int baz(void)
{
int rc = bar1();
if(0 == rc)
{
foo1();
foo2();
rc = bar2();
}
if(0 == rc)
{
foo3();
foo4();
rc = bar3();
}
return rc;
}
where, of course, my foo function calls are meant to represent
any amount of code, and bar function calls, some test condition.

If you use if..else then you end up with one more level of
indent each time and the code quickly moves off the right-hand
side of the screen.

Oh, I'm sorry. I didn't realise that was supposed to happen.
 
C

Chris Torek

[on avoiding early "return"s for error cases]

I would actually use the following code (which relies on bar1(), bar2(), and
bar3() to return the appropriate values (note that ERRn is not an
appropriate value, since it invades implementation namespace):

int baz(void)
{
int rc = bar1();
if(0 == rc)
{
foo1();
foo2();
rc = bar2();
}
if(0 == rc)
{
foo3();
foo4();
rc = bar3();
}
return rc;
}

It is worth pointing out two things here:

a) this tests for "rc == 0" more than once in some cases where we
already know that rc != 0; and
b) those are "error" cases and presumably errors are rare. :)

In other words, the repeated tests -- which some may find objectionable
-- occur only for failure paths, where speed is rarely all that
important.

I still tend, perhaps due to writing too much assembly (recently)
and unoptimized interpreted languages on slow processors (in the
past), to "want to" use early return or goto to avoid the redundant
tests; but in many cases the above structure turns out to be useful
anyway. For instance, it often turns out that on multiprocessing
systems, one needs some kind of data-structure locking mechanism,
so that this code becomes:

int error;
struct S *p = ...

lock(p);
error = op1(p);
if (error == 0)
error = op2(p);
if (error == 0)
error = op3(p);
unlock(p);
return error;

Here the "unlock" step is required even in the error cases, so
using a goto or early-return often requires duplicating the unlock.
(In this particular case, we can get away with a label right before
the unlock-and-return.) Unless it is important that the error
cases run as fast as possible, the redundant tests for "no error"
are not likely to be a problem.
 
E

Ed Davis

Hello!

I remember enjoying your messages many (20+) years ago on the
fido networks. Glad to see you're still around, and thanks for
your reply.
My version, insisting that each parser return its termination
symbol (one symbol lookahead):

Interesting. I've been looking at the sources to Wirth's little
compilers - PL/0, Pascal-S, and Oberon-0, and also van de
Snepscheut's very nice Pascal-S implementation. All of these
examples just make sym global, with get_sym being the only
function that explicitly assigns something to sym.

The snippet that I posted was similar to Wirth's Oberon-0
compiler.

Do you make sym a local in each function, or do you keep it
global?
For your general query about error handling, google for the source
to PascalS, which is very clear.

It is interesting to see how Wirth's solution to error checking
has changed over the years. In his later compilers, he uses the
order of the Symbol values rather than a set - perhaps partly
because Oberon doesn't have sets.
Most compilers today use parser generators, such as YACC or BISON,
rather than the simpler recursive descent. Recursive descent
usually has better error handling.

I always get hung up on the error handling using YACC. I tend to
like Coco/R, which generates recursive descent parsers.

But with the mentioned tools, I usually get stuck somewhere, and
end up wishing I had done it manually.

Thanks for the additional information. Now to make sure I have
less than 32 symbols (thanks for the operator compression trick).
 
C

CBFalconer

Ed said:
.... snip ...

I always get hung up on the error handling using YACC. I tend to
like Coco/R, which generates recursive descent parsers.

But with the mentioned tools, I usually get stuck somewhere, and
end up wishing I had done it manually.

Thanks for the additional information. Now to make sure I have
less than 32 symbols (thanks for the operator compression trick).

Thanks for the kind words. This is actually OT here on c.l.c,
apart from the means of faking a set, and comp.programming or
comp.compilers would probably be a better choice.
 
R

Richard Heathfield

Chris said:
[on avoiding early "return"s for error cases]

int baz(void)
{
int rc = bar1();
if(0 == rc)
{
foo1();
foo2();
rc = bar2();
}
if(0 == rc)
{
foo3();
foo4();
rc = bar3();
}
return rc;
}

It is worth pointing out two things here:

a) this tests for "rc == 0" more than once in some cases where we
already know that rc != 0;

This is very true, and very obvious (i.e. just about everybody who reads my
code raises the point eventually!)...
and
b) those are "error" cases and presumably errors are rare. :)

....but this, whilst equally true, seems to be less obvious! At least, (a) is
probably the most common criticism that has been levelled at my coding
style. I guess it takes a special kind of person to understand (b). :)

In other words, the repeated tests -- which some may find objectionable
-- occur only for failure paths, where speed is rarely all that
important.

Precisely so.


<snip>
 
O

Old Wolf

It is normally good practice to write functions with one unique exit
Because every arbitrary change in the control flow makes a program harder
for a maintainer to understand. And that maintainer could be you, in three
years from now.

Consider the position of a maintainer who is trying to understand a program
(for the excellent reason that it has a bug) and comes across a "premature"
return statement. He now has to assure himself that the premature return
has not contributed to the problem (e.g. by failing to perform some
necessary cleanup).

I do not think your alternative code below, alleviates this problem.
It would be just as easy for some code change far down the function
(assuming the function is very large, or this would not be an issue),
to upset the case of bar1() failing and control progressing
uninterrupted down to the bottom.
I would actually use the following code (which relies on bar1(), bar2(), and
bar3() to return the appropriate values (note that ERRn is not an
appropriate value, since it invades implementation namespace):

int baz(void)
{
int rc = bar1();
if(0 == rc)
{
foo1();
foo2();
rc = bar2();
}
if(0 == rc)
{
foo3();
foo4();
rc = bar3();
}
return rc;
}

AFAIC this is not stylistically better than:
int rc = bar1();
if (rc) goto cleanup;
...
cleanup:
return rc;

assuming that the function must return rc, and requires cleanup,
and it was not practical to put the cleanup in a parent function.

In fact I even prefer:
#define RETURN(r) { /* do cleanup */; return r; }
if (bar1())
RETURN(ERR1)

or similar (even a crappy compiler will correctly
optimise this so that the cleanup code only occurs in one place).
Oh, I'm sorry. I didn't realise that was supposed to happen.

Depends on your editor, I suppose (and the editors of anyone else
who will ever maintain your code). I like to keep line length
under 80 chars.
 
R

Richard Heathfield

Old said:
I do not think your alternative code below, alleviates this problem.

Glad to hear it. I happen to know that it /does/ alleviate the problem, but
I'm always willing to hear opposing views.

It would be just as easy for some code change far down the function
(assuming the function is very large, or this would not be an issue),

Good structure can make larger functions easier to understand than would be
the case without good structure, but it's not a panacea, and there comes a
point where a function is simply /too/ large.
to upset the case of bar1() failing and control progressing
uninterrupted down to the bottom.

Resource cleanup goes like this:

int foo(void)
{
int rc = 0;
T0 *p0 = AcquireResource(0);
if(p0 != NULL)
{
T1 *p1 = AcquireResource(1);
if(p1 != NULL)
{
/* we can keep doing this if
* we like, recursively ----
* but there comes a point
* where it makes sense to
* call a function to do
* the work instead.
*/
rc = Use(p0, p1);
ReleaseResource(p1);
}
else
{
rc = SOME_ERROR_CODE;
}

ReleaseResource(p0);
}
else
{
rc = SOME_OTHER_ERROR_CODE;
}
return rc;
}

AFAIC this is not stylistically better than:
int rc = bar1();
if (rc) goto cleanup;
...
cleanup:
return rc;

See above.
assuming that the function must return rc, and requires cleanup,
and it was not practical to put the cleanup in a parent function.

I clean up as I go.
In fact I even prefer:
#define RETURN(r) { /* do cleanup */; return r; }
if (bar1())
RETURN(ERR1)

or similar (even a crappy compiler will correctly
optimise this so that the cleanup code only occurs in one place).

Why not just clean up as /you/ go?
Depends on your editor, I suppose (and the editors of anyone else
who will ever maintain your code).

No, it doesn't. The style I showed you /did not/ quickly move off the
right-hand side of the screen for the excellent reason that the nesting was
kept very shallow.
I like to keep line length
under 80 chars.

I prefer 72. Mainframe heritage, I guess. And yet my preferred style does
not cause me indentation problems. Odd, that.
 
D

Dave Thompson

[on avoiding early "return"s for error cases]

I would actually use the following [structure]
int baz(void)
{
int rc = bar1();
if(0 == rc)
{
foo1();
foo2();
rc = bar2();
}
if(0 == rc)
{
[and so on]

It is worth pointing out two things here:

a) this tests for "rc == 0" more than once in some cases where we
already know that rc != 0; and
b) those are "error" cases and presumably errors are rare. :)

In other words, the repeated tests -- which some may find objectionable
-- occur only for failure paths, where speed is rarely all that
important.
Plus the compiler is perfectly entitled to specialize the basic block
entry depending on the path to it, in which case this becomes the
converse of the classic old "chained short branch" technique.
I still tend, perhaps due to writing too much assembly (recently)
and unoptimized interpreted languages on slow processors (in the
past), to "want to" use early return or goto to avoid the redundant
tests; but in many cases the above structure turns out to be useful
anyway. For instance, it often turns out that on multiprocessing
systems, one needs some kind of data-structure locking mechanism,
Here the "unlock" step is required even in the error cases, <snip>

Unless, as I thought someone else pointed out but now can't find in
this thread, it can be punted to the caller -- or a wrapper/proxy.
<OT> In C++ you can make it a RAII class and return -- or even throw
-- to get it destructed. :-} </>

Similarly all kinds of resource deallocation; my most common case is
open a file, read and process with many error checks/cases, but always
close. IIRC LISP has a special form just for this, so apparently some
other (influential!) people had the same experience. Memory (malloc
etc) is also obvious and common, probably even more so for some
(many?) people -- although it tends to go "benignly" unnoticed for
longer, sometimes even forever, if you miss deallocating it.

Personally, I'm happy with the occasional goto cleanup or early
return, and a quick /search if I need to find them, but DGND.

- David.Thompson1 at worldnet.att.net
 
J

Jeremy Yallop

Dave said:
Unless, as I thought someone else pointed out but now can't find in
this thread, it can be punted to the caller -- or a wrapper/proxy.
<OT> In C++ you can make it a RAII class and return -- or even throw
-- to get it destructed. :-} </>

Yes, that's one of the nicer features of C++, in my opinion.
Similarly all kinds of resource deallocation; my most common case is
open a file, read and process with many error checks/cases, but always
close. IIRC LISP has a special form just for this, so apparently some
other (influential!) people had the same experience.

More or less: Common Lisp has the special form `unwind-protect' to
handle the general problem of resource cleanup in the presence of
non-local exits, and the (standard) `with-open-file' macro which makes
use of it to close open files when control flow leaves the block by
whatever means. Scheme has the `dynamic-wind' operator to handle the
same problem.

Getting back to C, the time when something like this is /really/
needed, (in my opinion, again), is to handle the case where a function
not under your control (e.g. code called via a client-supplied
function pointer) longjmps straight past your cleanup code.

typedef void func_t(void *);

void fred(func_t *f, size_t size) {
void *p = get_resource(size);
f(p);
release_resource(p);
}

If f(p) invokes longjmp then the resource is leaked; there's nothing
fred() can do to stop f() calling longjmp or to ensure
release_resource() gets called regardless.
Personally, I'm happy with the occasional goto cleanup or early
return, and a quick /search if I need to find them, but DGND.

"DGND"? I've nothing against early returns or the occasional goto
either, FWIW.

Jeremy.
 
C

CBFalconer

Jeremy said:
.... snip ...

typedef void func_t(void *);

void fred(func_t *f, size_t size) {
void *p = get_resource(size);
f(p);
release_resource(p);
}

If f(p) invokes longjmp then the resource is leaked; there's
nothing fred() can do to stop f() calling longjmp or to ensure
release_resource() gets called regardless.

But the caller of fred can avoid passing such an uncooperative
value for f in the first place. If that caller doesn't know what
the passee can do, it shouldn't be passing it.
 
N

Nick Landsberg

CBFalconer said:
Jeremy Yallop wrote:

... snip ...



But the caller of fred can avoid passing such an uncooperative
value for f in the first place. If that caller doesn't know what
the passee can do, it shouldn't be passing it.

Question:
How can f(p) invoke longjmp(jmp_buf env, value) without
knowing the specific "jmp_buf env" argument which
was passed to setjmp()?

It would seem that the calling program/routine would
also have some culpability in making this a global
variable available to f()?
 
J

Jeremy Yallop

Nick said:
Jeremy Yallop wrote:

... snip ...
[...]
Question:
How can f(p) invoke longjmp(jmp_buf env, value) without
knowing the specific "jmp_buf env" argument which
was passed to setjmp()?

You do have a point, although there may be a single "global" jmp_buf
used throughout the program. The usual thing is to pass an extra
"context" argument to the callback function, like this:

void fred(func_t *f, void *context, size_t size) {
void *p = get_resource(size);
f(context, p);
release_resource(p);
}

And then, inside the function:

struct state {
token token;
jmp_buf env;
const char *error_msg;
lexer lexer;
};

void callback(void *p) {
struct state *state = p;
token = next_token(state);
if (invalid(token)) {
longjmp(state->env, 1);
}
// etc.
}

Jeremy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,141
Messages
2,570,815
Members
47,361
Latest member
RogerDuabe

Latest Threads

Top