Reading a line of text from a stream

N

Nelu

I would appreciate some comments on the code below for reading a line of
text from a stream.


#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

/**
* Reads a line of text from a stream and returns the string on
* success and NULL on failure.
* fstream - the stream to read from
* maxRead - a variable that specifies the maximum number of
* characters to read. If maxRead is NULL then SIZE_MAX-1 is
* considered as the maximum value. If maxRead is different from NULL
* then the function will set the value to the length of non-NULL
* string returned by the function.
* fullLine - a variable that specifies whether or not the end of line
* has been reached (a complete line was read). It also returns true if
EOF
* has been reached. It can be set to NULL.
*/
char *readStringWithLimit(FILE *fstream, size_t *maxRead,
int *fullLine) {
size_t capacityIncrement=256;
size_t capacity=capacityIncrement;
size_t readLimit;
size_t stringLength=0;
char *myStr, *tmpStr;
int localFullLine;
unsigned char c;
int ic;

if(maxRead==NULL) {
readLimit=SIZE_MAX-1;
} else {
if((*maxRead)>SIZE_MAX-1) {
readLimit=SIZE_MAX-1;
} else {
readLimit=*maxRead-1;
}
}

myStr=malloc(capacity);
if(!myStr) {
return NULL;
}
while(1) {
ic=fgetc(fstream);
if(ic==EOF) {
if(ferror(fstream)) {
free(myStr);
return NULL;
} else {
localFullLine=1;
break;
}
} else {
c=(unsigned char)ic;
if(c=='\n') {
localFullLine=1;
break;
} else {
if(readLimit>=stringLength) {
if(stringLength==capacity) {
//calculate the necessary size
if(SIZE_MAX-capacityIncrement>capacity) {
capacity+=capacityIncrement;
} else {
capacity=SIZE_MAX;
}
tmpStr=realloc(myStr,capacity);
if(tmpStr) {
myStr=tmpStr;
} else {
free(myStr);
return NULL;
}
}
myStr[stringLength++]=c;
} else {
ungetc(ic,fstream);
localFullLine=0;
break;
}
}
}
}
if(stringLength==capacity) {
tmpStr=realloc(myStr,capacity+1);
if(tmpStr) {
myStr=tmpStr;
} else {
free(myStr);
return NULL;
}
}
myStr[stringLength]='\0';
if(maxRead) {
*maxRead=stringLength;
}
if(fullLine) {
*fullLine=localFullLine;
}
return myStr;
}


Thank you.
 
M

Micah Cowan

Nelu said:
I would appreciate some comments on the code below for reading a line of
text from a stream.


#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

/**
* Reads a line of text from a stream and returns the string on
* success and NULL on failure.
* fstream - the stream to read from
* maxRead - a variable that specifies the maximum number of
* characters to read. If maxRead is NULL then SIZE_MAX-1 is
* considered as the maximum value. If maxRead is different from NULL
* then the function will set the value to the length of non-NULL
* string returned by the function.

You've accidentally left out documentation that if maxRead is not
NULL, it will be used to set an upper limit on the line read.

The documentation above is also a bit hard to read; it'd be nice if it
were easier to find the start of each parameter description. Perhaps
by indenting continuation lines:

* maxRead - a variable that specifies the maximum number of
* characters to read. If maxRead is NULL then SIZE_MAX-1
...

You've also neglected to explicitly specify that the terminating "\n"
will not be included in the string.

IME documentation tends to get out-of-sync with the code pretty
quickly, even when it doesn't already start out that way. My
preference is to document it in the form of readable test cases that
demonstrate the supported use cases (if the function itself can't be
made so readable as to make its usage obvious). Of course, I realize
that various shops have various documentation requirements, let alone
individual preferences.

It's a personal preference thing, but I dislike that you're
overloading maxRead's meaning to both specify an optional maximum
_and_ receive the size of the string. It makes things cumbersome for
callers that may desire one function or the other, but not both. The
one who merely wishes to specify an upper length may have to copy
their value into a "receiving" object, passing its pointer; the one
who wants the length but not an upper length either has to pass a var
preset to SIZE_MAX-1, or to call strlen() after.
* fullLine - a variable that specifies whether or not the end of line
* has been reached (a complete line was read). It also returns true if
EOF
* has been reached. It can be set to NULL.
*/
char *readStringWithLimit(FILE *fstream, size_t *maxRead,
int *fullLine) {
size_t capacityIncrement=256;
size_t capacity=capacityIncrement;
size_t readLimit;
size_t stringLength=0;
char *myStr, *tmpStr;
int localFullLine;
unsigned char c;
int ic;

if(maxRead==NULL) {
readLimit=SIZE_MAX-1;
} else {
if((*maxRead)>SIZE_MAX-1) {
readLimit=SIZE_MAX-1;
} else {
readLimit=*maxRead-1;
}
}

myStr=malloc(capacity);
if(!myStr) {
return NULL;
}
while(1) {
ic=fgetc(fstream);

Using fgets() is another option, and _might_ be faster; though it would
require a call to strlen() to find the newline.
if(ic==EOF) {
if(ferror(fstream)) {
free(myStr);
return NULL;
} else {
localFullLine=1;
break;
}
} else {
c=(unsigned char)ic;
if(c=='\n') {
localFullLine=1;
break;
} else {
if(readLimit>=stringLength) {
if(stringLength==capacity) {
^^ see below
//calculate the necessary size
if(SIZE_MAX-capacityIncrement>capacity) {
capacity+=capacityIncrement;
} else {
capacity=SIZE_MAX;
}
tmpStr=realloc(myStr,capacity);
if(tmpStr) {
myStr=tmpStr;
} else {
free(myStr);
return NULL;
}
}
myStr[stringLength++]=c;
} else {
ungetc(ic,fstream);
localFullLine=0;
break;
}
}
}
}
if(stringLength==capacity) {
tmpStr=realloc(myStr,capacity+1);
if(tmpStr) {
myStr=tmpStr;
} else {
free(myStr);
return NULL;
}
}

The above check and potential realloc() could have been avoided, if
you instead checked for stringLength+1==capacity at the previous
if statement that I marked.
myStr[stringLength]='\0';
if(maxRead) {
*maxRead=stringLength;
}
if(fullLine) {
*fullLine=localFullLine;
}
return myStr;
}

This design suffers from a couple of difficiencies (IMO):
- It is not possible to distinguish between an empty line and EOF.
- It is not possible to distinguish between I/O failures and
allocation failures (both return NULL).

The first one seems like the larger problem. The caller can call
feof(), to be sure, but it seems like poor design to require it.
 
N

Nelu

You've accidentally left out documentation that if maxRead is not NULL,
it will be used to set an upper limit on the line read.

The documentation above is also a bit hard to read; it'd be nice if it
were easier to find the start of each parameter description. Perhaps by
indenting continuation lines:

* maxRead - a variable that specifies the maximum number of *
characters to read. If maxRead is NULL then SIZE_MAX-1
...

You've also neglected to explicitly specify that the terminating "\n"
will not be included in the string.

IME documentation tends to get out-of-sync with the code pretty quickly,
even when it doesn't already start out that way. My preference is to
document it in the form of readable test cases that demonstrate the
supported use cases (if the function itself can't be made so readable as
to make its usage obvious). Of course, I realize that various shops have
various documentation requirements, let alone individual preferences.

The documentation was just a late addition and I didn't spend time on it.
I was more interested in the correctness of the code.
It's a personal preference thing, but I dislike that you're overloading
maxRead's meaning to both specify an optional maximum _and_ receive the
size of the string. It makes things cumbersome for callers that may
desire one function or the other, but not both. The one who merely
wishes to specify an upper length may have to copy their value into a
"receiving" object, passing its pointer; the one who wants the length
but not an upper length either has to pass a var preset to SIZE_MAX-1,
or to call strlen() after.

Fair enough.

The above check and potential realloc() could have been avoided, if you
instead checked for stringLength+1==capacity at the previous if
statement that I marked.

Good point.
This design suffers from a couple of difficiencies (IMO):
- It is not possible to distinguish between an empty line and EOF. -
It is not possible to distinguish between I/O failures and allocation
failures (both return NULL).

I was thinking about adding these then I thought that it would get to be
uglier than it already is.




Thank you.
 
M

Micah Cowan

Nelu said:
I was thinking about adding these then I thought that it would get to be
uglier than it already is.

Writing a readable implementation is surely important, but enabling
users that are using your function to write readable code is, IMO,
more important (also, I don't believe that the two have to be mutually
exclusive).

If you were even just returning NULL for EOF instead of an empty
string (which would require only very minor alterations), users could
write code like:

while ((str = readStringWithLimit(...)) != NULL) {
/* Do stuff with str */
free(str);
}
/* Possible further checking with ferror() goes here.
Need to use ferror() and feof() to determine which of
I/O error, EOF, and out-of-memory have occured. */

With code like the above, they'd be able to assume that the loop won't
break unless either they've reached EOF or else an error condition has
occurred.

However, as it is, they must do something like:

while ((str = readStringWithLimit(...)) != NULL) {
if (*str == '\0') {
if (feof(fstream))
break;
/* else it was just an empty line. */
}
/* Do stuff with str */
free(str);
}
/* Possible further checking with ferror() goes here. */

That seems much, much more cumbersome to me.

Note that if they use my first code snippet with your current
implementation, then if there are no errors the loop will go on
forever, because it will just keep returning the empty string over and
over again after EOF is reached.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,813
Latest member
lawrwtwinkle111

Latest Threads

Top