fgets - design deficiency: no efficient way of finding last character read

K

Keith Thompson

James Kuyper said:
For many functions with return values, the set of values that can be
represented in the return type can be partitioned into three subsets:
A) Values indicating successful operation
B) Values indicating some kind of a problem
C) Values that should not be returned.

If you assume that nothing can go wrong, checking whether the return
value is in set A is the same as checking whether the return value is
not in set B. People routinely write code based upon that assumption,
testing whichever condition is easier to express or evaluate.
I don't like making such assumptions. Among other possibilities:

1. I typed the wrong function name.
2. I remembered incorrectly which return values are in each of the three
categories.
3. The code got linked to a different library than it should have been,
containing a different function with the same name.
4. Some other part of the code contains a defect rendering the behavior
of the code undefined, and the first testable symptom of that fact is
that this particular function returns a value it's not supposed to be
able to return.
5. Other.

Therefore, I prefer to write my code so that values in set C get treated
the same way as values in set B. The problem I'm trying to deal with is
very rare, so I don't bother doing this if doing so would make the code
significantly more complicated. However, adding "== buf" falls below my
threshold for "significantly more complicated".
[...]

Interesting approach. It seems (to me) slightly obscure because I tend
not to think about fgets() returning its first argument, since it's not
a particularly useful value to return.

A more paranoid approach would be:

if ((result = fgets(buf, sizeof buf, fp)) == buf) {
/* ok */
}
else if (result == NULL) {
/* end-of-file or error */
}
else {
/* THIS SHOULD NEVER HAPPEN, print a stern warning and abort */
}

But unless you're writing a test suite for the standard library,
checking for illegal results from standard library functions probably
isn't worth the extra effort.
 
J

James Kuyper

For many functions with return values, the set of values that can be
represented in the return type can be partitioned into three subsets:
A) Values indicating successful operation
B) Values indicating some kind of a problem
C) Values that should not be returned.

If you assume that nothing can go wrong, checking whether the return
value is in set A is the same as checking whether the return value is
not in set B. People routinely write code based upon that assumption,
testing whichever condition is easier to express or evaluate.
I don't like making such assumptions. Among other possibilities:

1. I typed the wrong function name.
2. I remembered incorrectly which return values are in each of the three
categories.
3. The code got linked to a different library than it should have been,
containing a different function with the same name.
4. Some other part of the code contains a defect rendering the behavior
of the code undefined, and the first testable symptom of that fact is
that this particular function returns a value it's not supposed to be
able to return.
5. Other.

Therefore, I prefer to write my code so that values in set C get treated
the same way as values in set B. The problem I'm trying to deal with is
very rare, so I don't bother doing this if doing so would make the code
significantly more complicated. However, adding "== buf" falls below my
threshold for "significantly more complicated".
[...]

Interesting approach. It seems (to me) slightly obscure because I tend
not to think about fgets() returning its first argument, since it's not
a particularly useful value to return.

Yes, a pointer to (or perhaps, just after?) the last character written
to the buffer might be be more useful, in some circumstances. The same
issue of relatively useless return values is ubiquitous in the
string-handling functions.
A more paranoid approach would be:

if ((result = fgets(buf, sizeof buf, fp)) == buf) {
/* ok */
}
else if (result == NULL) {
/* end-of-file or error */
}
else {
/* THIS SHOULD NEVER HAPPEN, print a stern warning and abort */
}

But unless you're writing a test suite for the standard library,
checking for illegal results from standard library functions probably
isn't worth the extra effort.

Yes, going that far would exceed my threshold for "significantly more
complicated". For a third party library that was notorious for being
poorly implemented, such an approach might be more reasonable (assuming
that you had to use it, despite that notoriety).
 
J

jononanon

[...]
Interesting approach. It seems (to me) slightly obscure because I tend
not to think about fgets() returning its first argument, since it's not
a particularly useful value to return.



Yes, a pointer to (or perhaps, just after?) the last character written

to the buffer might be be more useful, in some circumstances. The same

issue of relatively useless return values is ubiquitous in the

string-handling functions.

How about this:
fgets2 should return a pointer to the final '\0' if it was written, or elsereturn NULL if feof() or ferror() is set.

(Oh and fgets2(buf, n, fp) should definately not write a '\0' if n = 1, but instead write nothing and then always return NULL).

But in any case, it's too late to change the standardized fgets().

GNU recommends using getline instead.
http://www.gnu.org/software/libc/manual/html_node/Line-Input.html


Comparing GNU's implementation of fgets to Dinkumware's is interesting! Dinkumware's fgets actually uses memchr(pt, '\n', len) to locate a '\n' and follows this by memcpy(s, pt, m), meaning that it iterates over the same buffer two times: first searching, then copying. Slow.
But it is very nicely readable I must say!

GNU introduced low-level functions.
For fgets()
https://sourceware.org/git/?p=glibc.git;a=blob;f=libio/iofgets.c;hb=HEAD
they use _IO_getline() which looks ummm... nice (but is less readable I think)
https://sourceware.org/git/?p=glibc.git;a=blob;f=libio/iogetline.c;hb=HEAD
but to my shock also has memchr() followed by memcpy(). Slow.

At least GNU's memcpy copies wordwise on longword boundaries.
https://sourceware.org/git/?p=glibc.git;a=blob;f=string/memchr.c
I'm not sure if Dinkumware does this.

If I were to roll my own library implementation, I'd do one that searches the stream's internal buffer (reading longwords and accessing the bytes), and immediately copies the longword if neither EOF nor found '\n'... etc.
i.e. iterating over stuff only once, and doing nice alignment etc.
Something like that.
 
M

Malcolm McLean

If I were to roll my own library implementation, I'd do one that searches
the stream's internal buffer (reading longwords and accessing the bytes),
and immediately copies the longword if neither EOF nor found '\n'... etc.

i.e. iterating over stuff only once, and doing nice alignment etc.
Something like that.
You can certainly return a quad from the buffer, then test for '\n' using
four masks and comparators. But EOF is harder to code.
Of course it depends on whether you expect to be in an environment which makes
much use of physical input streams or not. If you expect most input to be
via Unix like pipes and so on, it makes sense to optimise fgets(). But not if
you're reading from a keyboard.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,123
Messages
2,570,736
Members
47,289
Latest member
KathrynSta

Latest Threads

Top