Qry : Behaviour of fgets -- ?

K

Keith Thompson

CBFalconer said:
jacob navia wrote: [...]
In the case of fgets this implies:
o Testing for NULL.
o Testing for a bad value of n (<= 0)
o Testing for a NULL value in the receiving buffer.

NULLs are allowed in the receiving buffer, and don't matter since
they will either be beyond the line end or overwritten. It helps
to get the objective right before trying to code for it.
[...]

Let's be clear what we're talking about.

Here's the declaration of fgets:

char *fgets(char * restrict s, int n, FILE * restrict stream);

You're interpreting jacob's third point to mean null characters ('\0')
in the array pointed to by 's'. I don't believe that's what he meant;
the initial contents of the array are irrelevant, and if he were
talking about null characters he probably wouldn't have used the term
NULL, which is a macro that expands to a null pointer constant.

I think he was referring to checking in fgets whether s == NULL.
That's a reasonable check to make, but it's not required, and it
misses a plethora of other possible invalid values.

As a programmer, of course, I can't depend on any such checks, since
not all implementations perform them, and that's going to remain the
case for a long time. It's still entirely *my* responsibility to
avoid passing invalid arguments to fgets. If the implementation helps
me out by catching certain errors, that's great (unless it imposes too
much of a burden on *correct* calls), but I can't count on it. If I
don't trust myself to get this right, I can always write a wrapper
that checks the arguments before calling fgets.

A side note: Above I referred to "the array pointed to by 's'". Since
's' is a char*, it doesn't point to an array; it points to a char
(which presumably is the first element of an array). The standard
allows a char* to be a "pointer to a string", but there's no such
wording that allows it to be called a "pointer to an array", since a
pointer to an array is a valid and distinct concept. However, the
standard itself refers to the "the array pointed to by s" (C99
7.19.7.2p2). I suppose it's clear enough from context.
 
R

Richard Tobin

Yes - undefined behaviour means anything can happen,
[/QUOTE]
This statement is untrue (and obvious nonsense for any conceivable
real-world example). Undefined behaviours is 'behaviour [...] upon
which this international standard imposes no requirements'. This means
whatever the behaviour might or might not be is not relevant for
determining if the implementation is conforming to ISO/IEC 9899:1999,
ie follows the requirements imposed by it, or not.

Ok, let me rephrase it. Undefined behaviour means anything can happen
*without violating the standard*. No real implementation will make
DMR's head explode, but if it did it wouldn't be a standard violation.

-- Richard
 
A

Anand Hariharan

No, not in this case. You argument seems to be:

If we are going to test exhaustively all errors, that would be
impossible.
Consequence:
We do not test anything at all.

Methinks that is a reasonable stance for an implementor to take.

* Conform to the standard.
* Provide value-adding features like useful and meaningful
diagnostics. Kudos if you can do semantic code analysis at compile
time to detect possible UB at runtime.
* Provide any extensions deemed useful in a way that it doesn't break
anything conforming.

- Anand
 
C

CBFalconer

Bart said:
.... snip ...

The standard does indeed not require that UB causes my next door
neighbour to turn into a piano, but that appears to be exactly the
behaviour of my DeathStation 9000.
Are you claiming that this behaviour is not correct?

That depends. Is he in tune? :)
 
S

Scott Lurndal

[snip]
As a programmer, of course, I can't depend on any such checks, since
not all implementations perform them, and that's going to remain the
case for a long time. It's still entirely *my* responsibility to
avoid passing invalid arguments to fgets. If the implementation helps
me out by catching certain errors, that's great (unless it imposes too
much of a burden on *correct* calls), but I can't count on it. If I
don't trust myself to get this right, I can always write a wrapper
that checks the arguments before calling fgets.

Indeed. One may as well expect strcpy to return EFAULT when provided
arguments in which the destination pointer refers to an
unmapped portion of the virtual address space (or a portion mapped
read-only, or a portion to which the process has no access).

This was actually proposed to the X/Open group at one point and was
quickly rejected due to performance and correctness constraints.

scott
 
B

Barry Margolin

Sheth Raxit said:
don't you think instead of "not saying" or "saying undefined
behaviour" any additional checks are put itself in std ?

This would be redundant and inefficient. Instead of the programmer
checking the return value of fopen() once, all the I/O functions would
have to check the value of the stream every time.

Consider an inner loop calling fgetc(). This is a very simple
operation, probably only about 5-10 instructions most of the time
(because of buffering -- if there's data in the buffer it just has to
copy one byte and increment the buffer pointer). Adding the NULL check
would slow it down by 20-30%, I suspect.
 
J

Joe Wright

CBFalconer said:
NULLs are allowed in the receiving buffer, and don't matter since
they will either be beyond the line end or overwritten. It helps
to get the objective right before trying to code for it.

One is also allowed to snip obvious signatures, even though the
originator neglected to include a proper sig. marker. There are no
prizes for extreme laziness or rudeness.
Canonically,

char buff[80];
size_t size = sizeof buff;
while (fgets(buff, size, stream)) {
....
}
fgets will return NULL at EOF or error. The EOF condition is the normal
exit from the while loop.

fgets returns the value buff if it read something. We use strchr to
check for '\n' in buff indicating a complete line.

If the line is incomplete, either the line is longer than buff or it is
the last line of the stream and doesn't terminate with '\n'. To find out
which we use strchr searching for '\0' which will always succeed. If the
NUL is at buff[size-1] we assume the line is longer than buff. If the
NUL occurs earlier we assume an unterminated last line.

For well formed input we have enough information to do 'the right thing'
with the data.

Perversity.
It is apparently legal, if unusual, to have '\0' characters in text
files. fgets makes no exception for these NUL characters. If a too-long
line has a NUL in it, my logic above gets it wrong by assuming an
unterminated last line. Even if the line is terminated with '\n', if it
also contains NUL then using strcpy to move it from buff will fail
miserably.

All because NUL is legal in text streams. Or is it?
 
C

CBFalconer

Joe said:
CBFalconer said:
NULLs are allowed in the receiving buffer, and don't matter since
they will either be beyond the line end or overwritten. It helps
to get the objective right before trying to code for it.

One is also allowed to snip obvious signatures, even though the
originator neglected to include a proper sig. marker. There are no
prizes for extreme laziness or rudeness.

Canonically,

char buff[80];
size_t size = sizeof buff;
while (fgets(buff, size, stream)) {
....
}
fgets will return NULL at EOF or error. The EOF condition is the
normal exit from the while loop.

fgets returns the value buff if it read something. We use strchr
to check for '\n' in buff indicating a complete line.

If the line is incomplete, either the line is longer than buff or
it is the last line of the stream and doesn't terminate with '\n'.
To find out which we use strchr searching for '\0' which will
always succeed. If the NUL is at buff[size-1] we assume the line
is longer than buff. If the NUL occurs earlier we assume an
unterminated last line.

For well formed input we have enough information to do 'the right
thing' with the data.

Perversity.
It is apparently legal, if unusual, to have '\0' characters in
text files. fgets makes no exception for these NUL characters. If
a too-long line has a NUL in it, my logic above gets it wrong by
assuming an unterminated last line. Even if the line is terminated
with '\n', if it also contains NUL then using strcpy to move it
from buff will fail miserably.

The constant response to all line lengths thing is easily handled
by ggets. If the NULLs in the line are affecting you, you could
easily modify ggets.zip to ignore them (which may not be the right
cure). See:

<http://cbfalconer.home.att.net/download/>

Barring i/o or malloc failures ggets always returns complete lines,
with the final '\n' removed. The penalty is exercising the malloc
subsystem.

It has been pointed out that ggets is vulnerable to deliberate
interference, since it continues to expand the buffer as long as
input is arriving (without a '\n'). I think this is negligible,
partly because it is hard to provide a continuous input. It
actually doesn't tie up indefinitely, since once the malloc system
refuses to expand it exits with error.
 
A

Army1987

[...]To find out
which we use strchr searching for '\0' which will always succeed. If the
NUL is at buff[size-1] we assume the line is longer than buff.

Doesn't strlen(buf) != size - 1 do the same without looking that
weird?
 
K

Keith Thompson

Joe Wright said:
Canonically,

char buff[80];
size_t size = sizeof buff;
while (fgets(buff, size, stream)) {
....
} [...]
If the line is incomplete, either the line is longer than buff or it
is the last line of the stream and doesn't terminate with '\n'. To
find out which we use strchr searching for '\0' which will always
succeed. If the NUL is at buff[size-1] we assume the line is longer
than buff. If the NUL occurs earlier we assume an unterminated last
line.
[...]

strchr? Just use strlen.
 
K

Keith Thompson

CBFalconer said:
It has been pointed out that ggets is vulnerable to deliberate
interference, since it continues to expand the buffer as long as
input is arriving (without a '\n'). I think this is negligible,
partly because it is hard to provide a continuous input.
actually doesn't tie up indefinitely, since once the malloc system
refuses to expand it exits with error.

No, it's not hard to provide a continuous input. In fact, it's
trivial, as I've demonstrated:

./my_program < /dev/zero
It
actually doesn't tie up indefinitely, since once the malloc system
refuses to expand it exits with error.

Which is not necessarily benign.
 
C

CBFalconer

Keith said:
No, it's not hard to provide a continuous input. In fact, it's
trivial, as I've demonstrated:

./my_program < /dev/zero


Which is not necessarily benign.

However, this assumes the evildoer has access to the machine, and
can access devices, and can redirect input to access such a device,
etc. etc. I don't think this generally applies.

Actually the suggestion I made earlier in this thread, of making
ggets ignore a zero byte, would foul this attack entirely. Since
ggets is intended to input text this is probably a harmless change
and doesn't affect the .h linkage file in the least.
 
R

Richard Tobin

No, it's not hard to provide a continuous input. In fact, it's
trivial, as I've demonstrated:

./my_program < /dev/zero
[/QUOTE]
However, this assumes the evildoer has access to the machine, and
can access devices, and can redirect input to access such a device,
etc. etc. I don't think this generally applies.

What kind of input would someone have that *wouldn't* allow them that
kind of access? In the old days they might be sitting at a dumb
terminal, but now almost all remote access is from another computer
that can generate arbitrary data.

On the other hand, most remote access these days is probably through
HTTP, and for that the size of the data is usually known before the
remote application gets hold of it. It can be very big of course.

-- Richard
 
K

Keith Thompson

CBFalconer said:
However, this assumes the evildoer has access to the machine, and
can access devices, and can redirect input to access such a device,
etc. etc. I don't think this generally applies.

Sure it does. On a typical system, if the "evildoer" is able to run
the program at all, he can redirect its input from anywhere he likes.
Actually the suggestion I made earlier in this thread, of making
ggets ignore a zero byte, would foul this attack entirely. Since
ggets is intended to input text this is probably a harmless change
and doesn't affect the .h linkage file in the least.

That only affects the specific case of reading from /dev/zero (or
something equivalent). You could read from any arbitrary binary file;
depending on the kind of file, it might happen to contain an
arbitrarily long sequence of bytes containing neither '\n' nor '\0'.

And that's just something that can happen entirely by accident. I've
done that kind of thing myself trying to run "less" on a binary file
(it hangs until I kill it from another window). Given a malicious
attacker ("Here, take a look at this file!"), generating arbitrarily
long lines is trivial.

My examples are Unix-specific, but similar problems can occur on other
systems.
 
C

CBFalconer

Keith said:
CBFalconer said:
Keith said:
[...]
It has been pointed out that ggets is vulnerable to deliberate
interference, since it continues to expand the buffer as long as
input is arriving (without a '\n'). I think this is negligible,
partly because it is hard to provide a continuous input.

No, it's not hard to provide a continuous input. In fact, it's
trivial, as I've demonstrated:

./my_program < /dev/zero

actually doesn't tie up indefinitely, since once the malloc system
refuses to expand it exits with error.

Which is not necessarily benign.

However, this assumes the evildoer has access to the machine, and
can access devices, and can redirect input to access such a device,
etc. etc. I don't think this generally applies.

Sure it does. On a typical system, if the "evildoer" is able to run
the program at all, he can redirect its input from anywhere he likes.
Actually the suggestion I made earlier in this thread, of making
ggets ignore a zero byte, would foul this attack entirely. Since
ggets is intended to input text this is probably a harmless change
and doesn't affect the .h linkage file in the least.

That only affects the specific case of reading from /dev/zero (or
something equivalent). You could read from any arbitrary binary file;
depending on the kind of file, it might happen to contain an
arbitrarily long sequence of bytes containing neither '\n' nor '\0'.

And that's just something that can happen entirely by accident. I've
done that kind of thing myself trying to run "less" on a binary file
(it hangs until I kill it from another window). Given a malicious
attacker ("Here, take a look at this file!"), generating arbitrarily
long lines is trivial.

My examples are Unix-specific, but similar problems can occur on other
systems.

Generating such a file is a non-trivial exercise, as I discovered
when I created some long lines for ggets testing in the first
place. They have to ban the use of a byte holding numeric 10
everywhere, since this will always create a line ending. I
maintain the routine is no more dangerous than an ordinary program
that reads a file into memory, using malloc to create the storage.
Either can lead to harassment, admittedly, but they cannot form a
security hole. Almost anything that does input can be harassed in
the same general way.
 
A

Army1987

[...]To find out
which we use strchr searching for '\0' which will always succeed. If the
NUL is at buff[size-1] we assume the line is longer than buff.

Doesn't strlen(buf) != size - 1 do the same without looking that
weird?
Or better, set buff[size - 1] to a nonzero value, call fgets, and
check whether buff[size - 1] is zero. This takes O(1) time.
 
A

Army1987

[Removed comp.unix.programmer and added comp.std.c in the
newsgroups list]

Section 7.1.4 paragraph 1 of the C language standard makes it quite
clear that passing a null pointer to a function that does not document
accepting a null pointer is undefined behavior.

I guess that this mean that
printf("%p\n", (void *)0);
causes UB... Is this intended or is it just a glitch in the
Standard?
 
A

Army1987

/* mystdio.h */
#include <stdio.h> // just for FILE
extern char *myfgets(char *, int, FILE *);
#undef fgets // in case <stdio.h> defined it
#define fgets myfgets
You're not allowed to do that.
/* ... and others ... */

/* myfgets.c */
#include <stddef.h> // for size_t
stdio.h defines size_t, too.
#include <stdio.h>
[snip]
 
A

Army1987

A failed assert() is generally not a good way of handling an
error, since the application and its supporting libraries are
not given a chance to clean up (flush changes to databases,
etc.). Interrupting the application's procedure in the
middle of some chain of actions could have horrible
consequences. Consider if the error occurs between the
time when the medical equipment starts a pump to inject some
drug into a patient and the time when it stops the pump.

A segmentation violation in this situation would be not any better
than a failed assertion. I consider them both to be bugs. One
should only use assert() to verify conditions which one believes
to be mathematically and logically impossible to ever be false, so
that a failed assertion makes it clear that there is a non-obvious
bug. And for safety-critical applications, one should not use
untested code.
 
B

Bart van Ingen Schenau

CBFalconer said:
That depends. Is he in tune? :)
I really can't tell. His wife immediately moved out, taking him with
her, after it happened. :)
She seemed rather upset.

Bart v Ingen Schenau
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,091
Messages
2,570,605
Members
47,225
Latest member
DarrinWhit

Latest Threads

Top