A Question of Style

Malcolm McLean · Sep 30, 2013

On Sat, 2013-09-28, Rui Maciel wrote:

Odd. I think I'm seeing the opposite in recent years: "Wow, I can
fit in *and* break taboos at the same time, by using goto for error
handling! The Linux kernel does it so I can claim superiority by
association!"

Thinking moves on. When I learnt to program, I was told that you had
to flowchart before actually writing code. I even had a little stencil
with the flowchart symbols cut into it. But people weren't actually
using them for real development. Eventually it was admitted that they don't
make much sense when you can edit and run code interactively.Similarly, I went through a phase of never using gotos. But with increasing
experience and increasing confidence, you make your own judgments. Gradually
thinking catches up, and now it's quite normal to see gotos used in error
handling.

I don't mind this when goto /is/ clearly the best approach. Or when
there wasn't time to do it better. I /do/ mind when the author didn't
find the better design (probably breaking the function down) simply
because he wasn't looking.

The problem with "breaking the function down" is that it's no longer a leaf
function. So it can't be cut and pasted, someone can't look at it and
tell you exactly what it does, if it is changed, say to operate on reals
rather than integers, the whole file rather than just a portion of it
needs to be checked, because there might be calls to the broken out
subroutines elsewhere. Then the subroutines themselves often aren't good
functions, they might not make any sense except in the context of caller,
which isn't what procedural decomposition should be about.

That's not to say that there should be a blanket no. Obviously you wouldn't
want the logical corollary, where the entire program is a single function.
But I'd be very reluctant to create a subroutine for the primary purpose of
avoiding a goto.

Ian Collins · Sep 30, 2013

Malcolm said:
The problem with "breaking the function down" is that it's no longer a leaf
function. So it can't be cut and pasted, someone can't look at it and
tell you exactly what it does, if it is changed, say to operate on reals
rather than integers, the whole file rather than just a portion of it
needs to be checked, because there might be calls to the broken out
subroutines elsewhere.

So you would argue against splitting the origin function into something
like:

FILE *f = openFile( filename );
if( f )
{
const char* data = parseFile( f );

if( data )
{
processData( data );
free( data );
}
fclose( f );
}

As well as making it easier to see exactly what it does, is is also much
easier to unit test. Each sub-function can be testing in isolation,
without having to generate and use actual test files.

Then the subroutines themselves often aren't good
functions, they might not make any sense except in the context of caller,
which isn't what procedural decomposition should be about.

Functional decomposition is used to break down a task into logical
components. Opening a file, parsing the data and processing the data
are separate operations. Mixing them in one function is a job half done.

That's not to say that there should be a blanket no. Obviously you wouldn't
want the logical corollary, where the entire program is a single function.
But I'd be very reluctant to create a subroutine for the primary purpose of
avoiding a goto.

Where would you draw the line?

Malcolm McLean · Oct 1, 2013

Malcolm McLean wrote:

So you would argue against splitting the origin function into something
like:

FILE *f = openFile( filename );
if( f )
{
const char* data = parseFile( f );

if( data )
{
processData( data );
free( data );
}
fclose( f );
}

As well as making it easier to see exactly what it does, is is also much
easier to unit test. Each sub-function can be testing in isolation,
without having to generate and use actual test files.

There are other issues here.
You should separate "bit-shuffling" functions from those which do IO, process_
data is probably a bit-shuffling function, except that toy examples don't
really show the issues. Where is it sending its output? If to globals, then
probably those need taking out and replacing by a passed in structure, if
it's writing a file then I'd say it's badly designed, because all the file
paths should be at the same level.
Then files have the specific quirk that you should usually have two functions,
one which takes a filename, and one which takes an open stream. Then you can
easily switch from disk files to stdin, or from one file per record to
multiple records in a file.So I'd write function like this.

const char *readdata(char *filename, int *err)
{
const char *answer;
FILE *fp = fopen(filename, "r");
if(!fp)
{
*err = -2; /* file open error */
return 0;
}
answer = freaddata( fp, err);
fclose(fp);
return answer;
}

So that's boilerplate code. Every single function that opens a file will
have essentially the same body, and -2 will always be "can't open file",
-1 will always be "out of memory", -3 will always be "IO error", and so on.

so what does freaddata() look like? Essentially it's going to allocate memory,
read in the file, check it for integrity, and only return a pointer if it's
actually valid. Now we've said it returns a const char *, and we can't pass
a const char * to free(), which takes a non-const qualified void *. There
are probably reasons for making the return pointer const, but we'd better wrap
the complication up in a matching kill function.
So
killdata(const char *data)
{
if(data)
free((void *) data);
}

Now we always match the constructor and the kill

so

const char *data;
int err;

data = readdata("Hardcodedpath.txt", &err);
if(!data)
{
switch(err)
{
case -1: /* out of memory, print a message or whatever */
case -2: /*can't open the file */
case -3: /* Io error */
case -4: /* parse error, it starts getting difficult here because we
/* might want details of how and where the file was invalid */
}
}

if(data)
processdata(stdout, data);
/* for the sake of argument, processdata writes results to stdout. So make it
call fprintf rather than printf, and make this obvious at this level */

killdata(data);

So there's actually quite a lot at top level, though all it's really doing is
calling the constructor, destructor, and processing code, and reporting errors.
Even in the simplest case, every one of those four errors could potentially
occur.

Where would you draw the line?

I wouldn't focus on trying to make functions any particular length. That's
not a good procedural design principle.

What we need to ask is, is the function depending on IO, or on an external
library? If so, separate it out from the code which is doing logic. Then
is it reusable? readdata() is probably a reusable function if we pass it an
error return and handle the errors at a higher level, but not if we put in
calls to stderr in the function itself, because those messages might not
be wanted or make sense to another caller. Separate out the reusable
functions. Then are the functions a logical whole? Can we write a single line,
or two lines at most, in a comment above the function and give a maintainer
a very good idea what the function is doing? If we can, it's a good function,
it should be a unit.
Then finally, if a function is just depending on min2(), for example, you
might want to take that out to make it a leaf. Or if it's really long you
might want to break it up, essentially arbitrarily, purely to get the line
count down.

Phil Carmody · Oct 1, 2013

Ian Collins said:
Only in a manual and therefore error prone way. The big gain from C++
is automatic object destruction which removes a whole class of
programmer errors (and the need to ever use goto!). This is also just
about the only trick in C++'s box that C can't replicate.

So C++ was invented with GC, or new's and new[]'s would are
complemented by automatic object destruction? That's not the
CFront C++ I learnt - I had to use manual deallocation.

Fortunately C++ realised its error and introduced auto_ptr.

And then realised its error, and deprecated auto_ptr.

All hail the hypno-C++.

Phil

James Kuyper · Oct 1, 2013

Only in a manual and therefore error prone way. The big gain from C++
is automatic object destruction which removes a whole class of
programmer errors (and the need to ever use goto!). This is also just
about the only trick in C++'s box that C can't replicate.

Click to expand...

So C++ was invented with GC, or new's and new[]'s would are
complemented by automatic object destruction? That's not the
CFront C++ I learnt - I had to use manual deallocation.

That's odd. When I define objects with static or automatic storage
duration in C++, their constructors are automatically called at the
start of their lifetime, and when their lifetime ends, their destructors
are automatically called, even if execution is interrupted by a thrown
exception. When I need dynamic memory allocation, I use containers or
smart pointers which handle the allocation, construction, destruction,
deallocation and exception safety automatically for me, so I never need
to do it manually (except, of course, when designing my own containers).
Is that a change from CFront C++? I've never used CFront C++, but I
hadn't thought the differences were that great.

Malcolm McLean · Oct 1, 2013

So C++ was invented with GC, or new's and new[]'s would are
complemented by automatic object destruction? That's not the
CFront C++ I learnt - I had to use manual deallocation.

Click to expand...

That's odd. When I define objects with static or automatic storage
duration in C++, their constructors are automatically called at the
start of their lifetime, and when their lifetime ends, their destructors
are automatically called, even if execution is interrupted by a thrown
exception. When I need dynamic memory allocation, I use containers or
smart pointers which handle the allocation, construction, destruction,
deallocation and exception safety automatically for me, so I never need
to do it manually (except, of course, when designing my own containers).

Is that a change from CFront C++? I've never used CFront C++, but I
hadn't thought the differences were that great.

In early C++, you could create an object dynamically with new, which called
malloc(), or on the stack by simply calling the constructor. If you needed
an array of objects, or more than one bit of client code needed to access
the same object, there wasn't mcuh alternative to using new. So you had to
have a corresponding call to delete.

However the more modern thinking is that you shouldn't use arrays, you should
use templated vectors or other collection classes. Then you should provide
a copy constructor, which the collection calls internally, at will, for
instance if it wants to call realloc() on its buffer. So there's less call
for new and delete. However you tend to find there are odd areas where you
still need to treat objects as persistent pointers.

Ian Collins · Oct 1, 2013

Phil said:
Ian Collins said:

Only in a manual and therefore error prone way. The big gain from C++
is automatic object destruction which removes a whole class of
programmer errors (and the need to ever use goto!). This is also just
about the only trick in C++'s box that C can't replicate.

Click to expand...

So C++ was invented with GC, or new's and new[]'s would are
complemented by automatic object destruction? That's not the
CFront C++ I learnt - I had to use manual deallocation.

For dynamically allocated objects that is true, but for objects with
static or automatic storage duration construction and destruction is
(and always was) automatic. Which is why in C++ we often use objects
with static or automatic storage duration to manage dynamic resources.

Keith Thompson · Oct 1, 2013

Phil Carmody said:
Ian Collins said:

Only in a manual and therefore error prone way. The big gain from C++
is automatic object destruction which removes a whole class of
programmer errors (and the need to ever use goto!). This is also just
about the only trick in C++'s box that C can't replicate.

Click to expand...

So C++ was invented with GC, or new's and new[]'s would are
complemented by automatic object destruction? That's not the
CFront C++ I learnt - I had to use manual deallocation.

GC (garbage collection) usually refers to a system that detects
when an object no longer has anything referring to it and then
deallocates it more or less asynchronously. The time at which an
object's memory will be deallocated may be much later than the time
when it's no longer in use. There are GC systems that work with C,
though they impose some restrictions on what the C code can do.

C++'s automatic object destruction isn't like that. When an
object reaches the end of its lifetime, its destructor is called
automatically; the destructor is defined by the programmer and
invoked implicitly. For a C object with automatic storage duration,
for example, its memory is deallocated at the end of its lifetime.
For a C++ object, you can define actions in addition to memory
deallocation. For an object allocated with "new" (analagous to C's
malloc()), the same actions occur when the object is deallocated with
"delete" (analogous to C's free()).

Quite possibly you know all this, but it may be useful to others.

Fortunately C++ realised its error and introduced auto_ptr.

And then realised its error, and deprecated auto_ptr.

I don't know enough about auto_ptr to comment on it.

Malcolm McLean · Oct 1, 2013

There are GC systems that work with C, though they impose some restrictions
on what the C code can do.

You can't write a pointer to a file or other backing store, read it back in,
and expect the garbage collector to know that you've still kept hold of it.
But what sort of wonderful code would do that anyway?

The main issue with GC isn't that the C code can't use the full language,
it's that the garbage collection step takes time, which is often unpredictable.
So if you're doing animations you might find it suddenly skipping frames, for example.

Rui Maciel · Oct 4, 2013

Jorgen said:
Odd. I think I'm seeing the opposite in recent years: "Wow, I can
fit in *and* break taboos at the same time, by using goto for error
handling! The Linux kernel does it so I can claim superiority by
association!"

Nonsense. You don't break taboos by implementing standard, time-tested techniques which even
precede C. You're grasping at straws here.

I don't mind this when goto /is/ clearly the best approach. Or when
there wasn't time to do it better. I /do/ mind when the author didn't
find the better design (probably breaking the function down) simply
because he wasn't looking.

Your claim regarding what you believe is the "better design" is only your subjective and
unfounded personal opinion. Anyone can make the exact same claim regarding your personal
choice, and that claim would be just as reasonable and valid.

In addition, it doesn't make any sense to assume that the author failed to implement what you
claim to be "the better design" just because the author didn't implemented your personal
preference, or didn't mindlessly avoided using a language feature due to an irrational fear
caused by the uninformed opinion some people might have regarding its use.

Rui Maciel

Malcolm McLean · Oct 4, 2013

Your claim regarding what you believe is the "better design" is only
your subjective and unfounded personal opinion. Anyone can make the exact
same claim regarding your personal choice, and that claim would be just as
reasonable and valid.

To be fair, goto is often a hack. Sometimes I find this situation.

int fiddle_param = 42;

hack_label:
/*
intricate code
*/
if (ohohwevegotadividebyzero() )
{
/*fiddle_param was a bad choice and it's crashing out at this point */
fiddle_param = 43;
goto hack_label;
}

That's great for establishing that fiddle_param really is the problem.

But you don't want to leave the code in that state.

Tim Rentsch · Oct 22, 2013

Rui Maciel said:
Tim said:

Geoff said:

I found this code in a Quake 2 project. Would you call this a
case of premature optimization or a case of single-returnitis in
the face of a necessarily multi-return function. I am referring
to the multiple gotos in the function designed to bypass a single
line of code. [snip 74 line function definition]

Click to expand...

I would call it atrocious code in need of re-writing. A
classic example of trying to do too much in a single function,
with hard-to-follow control flow being just one symptom.

Click to expand...

I looked at your alternative and saw the same thing: atrocious
code in need of re-writing. [snip elaboration]

Unless you come up with a proposed revision, along with reasons
explaining what makes the suggested re-write an improvement, as I
and several others have done, I see no reason to take these
comments as anything other than trolling.

Question to malloc	13	Feb 20, 2006
error	28	Aug 30, 2012
Fibonacci	0	May 13, 2023
Command Line Arguments	0	Mar 7, 2023
Comparison of Integer and Pointer (that's supposed to be an Integer). Where did I go wrong?	0	Nov 19, 2022
Help with EXT3 Filesystem work	1	Mar 13, 2022
Windows LLDP Driver Responds With No Data	0	Mar 17, 2023
Please help with C programming to save GPS reception data in Raspberry Pi.	0	Dec 8, 2022

A Question of Style

Malcolm McLean

Ian Collins

Malcolm McLean

Phil Carmody

James Kuyper

Malcolm McLean

Ian Collins

Keith Thompson

Malcolm McLean

Rui Maciel

Malcolm McLean

Tim Rentsch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads