Malcolm McLean wrote:
So you would argue against splitting the origin function into something
like:
FILE *f = openFile( filename );
if( f )
{
const char* data = parseFile( f );
if( data )
{
processData( data );
free( data );
}
fclose( f );
}
As well as making it easier to see exactly what it does, is is also much
easier to unit test. Each sub-function can be testing in isolation,
without having to generate and use actual test files.
There are other issues here.
You should separate "bit-shuffling" functions from those which do IO, process_
data is probably a bit-shuffling function, except that toy examples don't
really show the issues. Where is it sending its output? If to globals, then
probably those need taking out and replacing by a passed in structure, if
it's writing a file then I'd say it's badly designed, because all the file
paths should be at the same level.
Then files have the specific quirk that you should usually have two functions,
one which takes a filename, and one which takes an open stream. Then you can
easily switch from disk files to stdin, or from one file per record to
multiple records in a file.So I'd write function like this.
const char *readdata(char *filename, int *err)
{
const char *answer;
FILE *fp = fopen(filename, "r");
if(!fp)
{
*err = -2; /* file open error */
return 0;
}
answer = freaddata( fp, err);
fclose(fp);
return answer;
}
So that's boilerplate code. Every single function that opens a file will
have essentially the same body, and -2 will always be "can't open file",
-1 will always be "out of memory", -3 will always be "IO error", and so on.
so what does freaddata() look like? Essentially it's going to allocate memory,
read in the file, check it for integrity, and only return a pointer if it's
actually valid. Now we've said it returns a const char *, and we can't pass
a const char * to free(), which takes a non-const qualified void *. There
are probably reasons for making the return pointer const, but we'd better wrap
the complication up in a matching kill function.
So
killdata(const char *data)
{
if(data)
free((void *) data);
}
Now we always match the constructor and the kill
so
const char *data;
int err;
data = readdata("Hardcodedpath.txt", &err);
if(!data)
{
switch(err)
{
case -1: /* out of memory, print a message or whatever */
case -2: /*can't open the file */
case -3: /* Io error */
case -4: /* parse error, it starts getting difficult here because we
/* might want details of how and where the file was invalid */
}
}
if(data)
processdata(stdout, data);
/* for the sake of argument, processdata writes results to stdout. So make it
call fprintf rather than printf, and make this obvious at this level */
killdata(data);
So there's actually quite a lot at top level, though all it's really doing is
calling the constructor, destructor, and processing code, and reporting errors.
Even in the simplest case, every one of those four errors could potentially
occur.
Where would you draw the line?
I wouldn't focus on trying to make functions any particular length. That's
not a good procedural design principle.
What we need to ask is, is the function depending on IO, or on an external
library? If so, separate it out from the code which is doing logic. Then
is it reusable? readdata() is probably a reusable function if we pass it an
error return and handle the errors at a higher level, but not if we put in
calls to stderr in the function itself, because those messages might not
be wanted or make sense to another caller. Separate out the reusable
functions. Then are the functions a logical whole? Can we write a single line,
or two lines at most, in a comment above the function and give a maintainer
a very good idea what the function is doing? If we can, it's a good function,
it should be a unit.
Then finally, if a function is just depending on min2(), for example, you
might want to take that out to make it a leaf. Or if it's really long you
might want to break it up, essentially arbitrarily, purely to get the line
count down.