Qry : Behaviour of fgets -- ?

C

Charlie Gordon

Ben Bacarisse said:
<snip>
OK. I have always taken "it does this" to mean "... and nothing else"
but doubt has been raised about fgets and mbstowcs has an explicit
"only n chars written" clause. Do you feel that fgets is similarly
unambiguous?

7.19.7.2 The fgets function

Synopsis

1 #include <stdio.h>
char *fgets(char * restrict s, int n,
FILE * restrict stream);

Description

2 The fgets function reads at most one less than the number of characters
specified by n
from the stream pointed to by stream into the array pointed to by s. No
additional
characters are read after a new-line character (which is retained) or after
end-of-file. A
null character is written immediately after the last character read into the
array.

Returns

3 The fgets function returns s if successful. If end-of-file is encountered
and no
characters have been read into the array, the contents of the array remain
unchanged and a
null pointer is returned. If a read error occurs during the operation, the
array contents are
indeterminate and a null pointer is returned.

Except in the case of a read error, the contents of the array seems pretty
well described by the above wording. fgets read characters from the stream
into the array, and then writes a null character after the last character
read. Since characters are "read into the array", you would need additional
read/write operations to move them where they belong, had they no been read
in the proper place. No such operation is described as being performed by
fgets.

The real problems with the fgets definition in the standard are:

- why is the array size specified as int instead of size_t
- what is the behaviour of fgets for n == 0, and for n < 0 ?
- will fgets return NULL or s when called with n == 1 and the stream
contains no more characters but EOF has not yet been encountered ? (I don't
think it should attempt to read any byte from the stream, and thus cannot
detect EOF, so it should return s)
- to a lesser extend, why pollute the prototype with useless restrict
keywords
 
C

Chris Dollin

Rainer said:
That's a claim which should be easy to prove: Assuming you are not
deliberatetly misunderstanding me, you appear to claim that 'undefined
behavious' is actually behaviour defined by the C-standard.

chris.dollin isn't claiming that; you've misread his parentheses,
which should be bound to "undefined behaviour", not "every program
behaviour".
More correctly (as I have written several times now): There is no such
thing as 'acceptable behaviour' for these case (provided you cannot
magically come up with a positive definition of this beahviour)
because no acceptable behaviour is defined.

Yes, you've written this several times, and been wrong every one
of them. /Because/ no requirements are given by the standard,
/every/ behaviour is acceptable.
 
C

Chris Dollin

Bart said:
Chris said:
Francis said:
[some examples of UB that actually happened]
What part of the Standard prevents those things from happening when
a conforming program executes -- better, which stop an implementation
that does these things from being a conformant implementation?
How about clause 5.1.2.3/5, which loosely translated states that, when
observing the side-effects of a program, the user may not be able to
tell the difference between execution on the abstract machine and
execution on the actual machine.

I /think/ that can only apply to those aspects of the implementation
observable by the abstract machine.

Any other offers?
 
C

Chris Dollin

Ben said:
Chris Dollin said:
Ben said:
E[X] = _|_ does not (always) mean X does not terminate. In the lambda
calculus, non-termination is the most common way to get _|_, but
there are others (E["tail []"] = _|_ in most semantics, for example).

I'd have thought that in /most/ semantics, `tail []` is an error value
or invokes a continuation.

[I have honoured your "followup" this time, but I think it often
better, when one is saying essentially "you are wrong", to limit
*that* message rather than suggesting a limited "followup". Last time
I wanted to defend myself to all how saw your "you are mistaken"
reply!]

(I, for one, didn't understand this paragraph.)
OK, maybe not most. Some. The point is that the semantics of
a language may choose to say nothing about some programs. This can be
done with partial functions or with _|_.
Yes.


I don't follow this. Bottom is just the least element of a poset
(formally) and the least defined value in the set of "meanings" (to be
rather vague about it). Why would one implement it?

One has no choice if one runs programs whose meaning is bottom.
None. And of course it should not since if 'i' is an 'int', UB is
possible (on overflow). That is the clc answer, of course! Assuming
you mean 'i += 1;' when i is, say, 0,
Indeed.

then I think it follows from a
kind of Occam's razor. An operational description like that in the
standard is never entirely unambiguous, but there is no need to add
possibilities that essentially rob the document of all meaning!

They don't -- the document /still/ prescribes the behviour of the
abstract machine.
 
R

Richard

Charlie Gordon said:
Richard said:
Kelsey Bjarnason said:
[snips]

On Fri, 14 Sep 2007 13:12:44 +0200, Richard wrote:

You need to run this past me again. Why shouldn't people who don't know
what they are doing not use it?

You mean why _should_ they not use it? Because people who don't know
what

I was making a joke and I think,yes, there is one too many
negatives.. You see, if people dont know how to use an API then they,
well, shouldn't. It is obvious. But they can learn by doing and with
judicous test cases, and debug cycles they will learn to use it
properly.

The situation is much worse than you make it sound: they don't know the
semantics, but they think they do. They write code that seems to work as
they expect but contains bugs waiting to bite... just like gets and
sprintf.

Who is "they"? Would you let them use malloc too in case they forgot to
free the memory? Which arbitrary rules are you using to decide "over
complexity"? strncpy is as basic as they get if oyu have a clue about "C
strings".

I agree: show them the API as an explanation for why they should not use
strncpy.

Huh? How? Why? What is so difficult? Shall we stop them using printf in
case they pass a float instead of an int and the resulting output causes
the process on the end of the pipe to crash an aircraft?
to reach millions, you need to count the binaries produced.

No I don't. I said "lines of code". Not binaries.
You cannot decently say that there be *nothing* wrong with it: its precise
semantics make it almost useless, are conter-intuitive and are vastly
misunderstood, and misused...

Not in my, reasonably extensive, experience of rather large code bases
written in C.
Why did C99 not include reentrant versions of strtok and friends
baffles me.

Probably because the pedants spent too much time arguing about when a
char was an int and when it wasn't.
 
P

Peter J. Holzer

["Followup-To:" header set to comp.lang.c.]
Charlie Gordon said:
Richard said:
[snips]

On Fri, 14 Sep 2007 13:12:44 +0200, Richard wrote:

You need to run this past me again. Why shouldn't people who don't know
what they are doing not use it?

You mean why _should_ they not use it? Because people who don't know
what

I was making a joke and I think,yes, there is one too many
negatives.. You see, if people dont know how to use an API then they,
well, shouldn't. It is obvious. But they can learn by doing and with
judicous test cases, and debug cycles they will learn to use it
properly.

The situation is much worse than you make it sound: they don't know the
semantics, but they think they do. They write code that seems to work as
they expect but contains bugs waiting to bite... just like gets and
sprintf.

Who is "they"? Would you let them use malloc too in case they forgot to
free the memory? Which arbitrary rules are you using to decide "over
complexity"? strncpy is as basic as they get if oyu have a clue about "C
strings".

The problem with strncpy is that is does NOT produce "C strings".

The result of

char *s = "a string of at least 80 characters ...";
char buf[80];
strncpy(buf, s, 80);

is buf not being zero-terminated, which makes it unsafe to use with
almost all other string functions. And no, strncpy(buf, s, 79) doesn't
help either, unless you add an additional buf[79] = '\0'.

strncpy is intended for filling zero-padded (not zero-terminated)
buffers of fixed width. That may be useful for certain file formats
(wtmp/utmp have been mentioned, the original UNIX directory structure
also comes to mind), but it is almost always the wrong tool when you
deal with "C strings".

Huh? How? Why? What is so difficult?

It's not difficult it's just the wrong tool for the job.

hp
 
R

Richard Bos

Douglas A. Gwyn said:
Well, it isn't meaningless, but it doesn't affect code generation.

There is a proposal before the C standards committee to adopt
somethings like Microsoft's __declspec facility for annotating
C source code with "attributes" that are outside the scope of
the current C standard. I suspect every experienced C
programmer has occasionally thought that there ought to be
somethings of the sort. Recall /*NOTREACHED*/ to avoid a
spurious warning from "lint"? There is a __declspec attribute
that has the same meaning.

Don't we already have #pragma for this?

Richard
 
C

Charlie Gordon

Richard said:
Charlie Gordon said:
Richard said:
[snips]

On Fri, 14 Sep 2007 13:12:44 +0200, Richard wrote:

You need to run this past me again. Why shouldn't people who don't
know
what they are doing not use it?

You mean why _should_ they not use it? Because people who don't know
what

I was making a joke and I think,yes, there is one too many
negatives.. You see, if people dont know how to use an API then they,
well, shouldn't. It is obvious. But they can learn by doing and with
judicous test cases, and debug cycles they will learn to use it
properly.

The situation is much worse than you make it sound: they don't know the
semantics, but they think they do. They write code that seems to work as
they expect but contains bugs waiting to bite... just like gets and
sprintf.

Who is "they"? Would you let them use malloc too in case they forgot to
free the memory? Which arbitrary rules are you using to decide "over
complexity"? strncpy is as basic as they get if oyu have a clue about "C
strings".

They is a large proportion of C programmers.
Forgetting to call free causes problems in very few programs. I'm not
advocating sloppiness about it, and it is much easier to understand and
master malloc/free than strncpy. realloc is a different matter of course, I
definitely recommend for newbies to stay away from it.
the problem with strncpy is *precisely* that it does not deal with C
strings. It's purpose is to initialize fixed size, 0 padded, non
necessarily 0 terminated char arrays. Such objects are not use anywhere
else in the Standard, and very rarely in real applications.
Huh? How? Why? What is so difficult? Shall we stop them using printf in
case they pass a float instead of an int and the resulting output causes
the process on the end of the pipe to crash an aircraft?

Good compilers will detect such errors at compile time. Of course in your
example, the float is passed as a double, but you knew that... A more subtle
and proplematic example is this:

printf("%.*s\n", sizeof(buf), buf);

This printf format is needed to output the contents of a non 0 terminated
buffer (such as the one strnpy could have initialized). The problem here is
printf expects an int as the value for the '*' place-holder, and sizeof(buf)
is a size_t, a type that can be larger than an int these days (long on linux
64bit, long long on Windows 64 bits where int is 32 bits on both).

Yes, printf has its intricacies, but newbies rarely run into them.
Conversely, wrongly named strncpy is misused almost all the time, especially
by newbies.
No I don't. I said "lines of code". Not binaries.

Millions of lines with strncpy in them ?
You may be right, www.google.com/codesearch come up with 314000 matches for
strncpy !
So many bugs waiting to bite !
Not in my, reasonably extensive, experience of rather large code bases
written in C.

Well I don't know what code bases you are refering to, and I could not find
any open source code with your copyright (Richard G. Riley). But take a
second look...

Use google codesearch, look to strncpy sizeof... check the first 4: all
wrong one way or another.
Even openssl gets it wrong:

char e_buf[32+1]; /* replace 32 by 8 ? */

/* Copy at most 32 chars of password */
strncpy (e_buf, buf, sizeof(e_buf));

/* Make sure we have a delimiter */
e_buf[sizeof(e_buf)-1] = '\0';

Granted there is no bug here, at least the comment is wrong, and the
occasional reader of this code will be mislead.
Probably because the pedants spent too much time arguing about when a
char was an int and when it wasn't.

Or figuring iso646.h.
 
D

Douglas A. Gwyn

Richard said:
Don't we already have #pragma for this?

Also the _Pragma operator, which might be closer to what is desired.
I'm sure that will come up in the technical discussion next month.

Anyway, the request is to standardize this facility in some form.
 
K

Kelsey Bjarnason

[snips]

The problem with strncpy is that is does NOT produce "C strings".

It does when told to.
The result of

char *s = "a string of at least 80 characters ...";
char buf[80];
strncpy(buf, s, 80);

is buf not being zero-terminated, which makes it unsafe to use with
almost all other string functions. And no, strncpy(buf, s, 79) doesn't
help either, unless you add an additional buf[79] = '\0'.

Keerect. Which is to say, it is - like any other function - only really
reliable when used properly.
also comes to mind), but it is almost always the wrong tool when you
deal with "C strings".

Actually, I kinda like the idea. "I need up to N chars of this buffer, so
even if it's 3N chars long, only gimme what I need." Makes sense; if
nothing else it lessens the likelihood of a buffer overflow, as you can
specify exactly how much data to copy. Not that there aren't other
methods, but as a library function this one will, presumably, be "more
optimal" than a loop and have sufficiently different semantics from memcpy
as to make it useful on its own - you just need to remember to terminate
the resultant buffer.
 
K

Keith Thompson

Douglas A. Gwyn said:
Also the _Pragma operator, which might be closer to what is desired.
I'm sure that will come up in the technical discussion next month.

Anyway, the request is to standardize this facility in some form.

Currently a use of the _Pragma operator is described as a "unary
operator expression". I assume that means it can apear only in a
context where a unary-expression could appear. (It's odd, though,
that 6.10.9 calls this a "unary operator expression", while 6.5.3 uses
the term "unary-expression".)

But allowing _Pragma in more contexts shouldn't be a problem.
 
P

Peter J. Holzer

[snips]

The problem with strncpy is that is does NOT produce "C strings".

It does when told to.

No. There is no way you can tell strncpy to produce a zero-terminated
string. You have to add the terminator yourself.

The result of

char *s = "a string of at least 80 characters ...";
char buf[80];
strncpy(buf, s, 80);

is buf not being zero-terminated, which makes it unsafe to use with
almost all other string functions. And no, strncpy(buf, s, 79) doesn't
help either, unless you add an additional buf[79] = '\0'.

Keerect. Which is to say, it is - like any other function - only really
reliable when used properly.

Oh, strncpy is very reliable. It just does reliably the wrong thing.


Actually, I kinda like the idea. "I need up to N chars of this buffer, so
even if it's 3N chars long, only gimme what I need."

But strncpy doesn't give you "up to N chars". It always gives you
exactly N chars. That's wasteful if your source string is shorter and
unsafe if it is longer. strncat or snprintf is almost always the better
choice. Even strcpy is better - at least then you know you have to be
careful.

Makes sense;

If you are constructing records for some specific file formats, yes. In
no other case.
if nothing else it lessens the likelihood of a buffer overflow, as you
can specify exactly how much data to copy. Not that there aren't
other methods, but as a library function this one will,

strncat and snprintf are standard library functions, too.
(But both have their warts, too. There must be some law that if there
are several competing libraries, it is always the one with the least
useful and most bizarre behaviour which gets standardized).

hp
 
K

Kelsey Bjarnason

[snips]

The problem with strncpy is that is does NOT produce "C strings".

It does when told to.

No. There is no way you can tell strncpy to produce a zero-terminated
string. You have to add the terminator yourself.

char dst[128];
char *src = "abc";
strncpy( dst, src, sizeof(dst) );

This *won't* produce a proper null terminated string in dst? I have to
add the terminator myself? News to me.
Oh, strncpy is very reliable. It just does reliably the wrong thing.

Copies n chars, padding as necessary - exactly as it's described to.
Yeah, fine, if it acted a little more like, say, fgets such that the size
parameter was actually size-1 and space was left over for a terminating
null, great, agreed, this wouldn't be a bad design either. Perhaps even a
better design.

On the other hand, if I'm trying to, oh, use it to insert into an existing
string, I probably don't want terminators stuffed in, so the way it works
now is viable, too.
But strncpy doesn't give you "up to N chars". It always gives you
exactly N chars.

Not what I said; I said "up to N chars of this buffer". It *cannot*
(assuming the source buffer is a proper string) give me N chars of it if
the buffer contains fewer than N chars.
That's wasteful if your source string is shorter and
unsafe if it is longer. strncat or snprintf is almost always the better
choice. Even strcpy is better - at least then you know you have to be
careful.

Yeah, strncpy ain't perfect by any means. C++ strings are somewhat saner
overall, but then, this ain't C++.
strncat and snprintf are standard library functions, too.

And this has what, precisely, to do with the "more optimal" comment made?
 
W

Wojtek Lerch

Keith Thompson said:
Currently a use of the _Pragma operator is described as a "unary
operator expression". I assume that means it can apear only in a
context where a unary-expression could appear.

Maybe not. _Pragma expressions are "executed" and removed in translation
phase 4. Syntax analysis doesn't happen until phase 7.
But allowing _Pragma in more contexts shouldn't be a problem.

During translation phase 4, the program is just a sequence of preprocessing
tokens and white space. My guess would be that any occurrence of the
_Pragma operator followed by a string literal in parentheses constitutes a
_Pragma expression, regardless of the context.
 
R

Richard Bos

Douglas A. Gwyn said:
Also the _Pragma operator, which might be closer to what is desired.
I'm sure that will come up in the technical discussion next month.

Anyway, the request is to standardize this facility in some form.

Not being a Committee member, I'm not allowed to vote - but I vote
against. It's completely unnecessary.

Richard
 
D

Douglas A. Gwyn

Richard said:
Not being a Committee member, I'm not allowed to vote - but I vote
against. It's completely unnecessary.

An argument for standardization is that there is a lot of
(possibly) otherwise portable C source code around that
cannot be readily ported to other platforms due to
containing such decorations. The fact that more than one
major compiler implementation added such a feature seems
to indicate that there is a practical need for it.

That said, I'm not happy with the idea.
 
P

Peter J. Holzer

[snips]
On Sun, 16 Sep 2007 23:19:51 +0200, Peter J. Holzer wrote:

The problem with strncpy is that is does NOT produce "C strings".

It does when told to.

No. There is no way you can tell strncpy to produce a zero-terminated
string. You have to add the terminator yourself.

char dst[128];
char *src = "abc";
strncpy( dst, src, sizeof(dst) );

This *won't* produce a proper null terminated string in dst?

You don't *tell* strncpy to produce a zero-terminated string here. It
just happens to do so because the source string is shorter than the
destination buffer. If src happens to point to a longer string, dst
won't be zero-terminated.

And this is exactly the kind of buggy code I was thinking about: The
code is "safe" in exactly the same situation when strcpy would have been
"safe", too. When strcpy isn't safe, this isn't either. The failure is
just a bit more subtle, and less likely to be noticed in a code audit.

To be safe you either need to check the length of src (but then you can
use strcpy, too) or manually add the terminator.

And of course you are needlessly filling a lot of memory with zeros.
In your example, instead of writing 4 bytes, you write 128 bytes: An
overhead of 3100%!

On the other hand, if I'm trying to, oh, use it to insert into an existing
string, I probably don't want terminators stuffed in, so the way it works
now is viable, too.

You just have to determine the length beforehand. If you already know
the length, why don't you use memcpy (or memmove)? Your defense of
strncpy gets more and more far-fetched.

And this has what, precisely, to do with the "more optimal" comment made?

You were essentially saying "strncpy is more optimal than strncat
because strncpy is a library function". Please consider this sentence
and think again whether it makes any sense.

hp
 
K

Kelsey Bjarnason

[snips]

char dst[128];
char *src = "abc";
strncpy( dst, src, sizeof(dst) );

This *won't* produce a proper null terminated string in dst?

You don't *tell* strncpy to produce a zero-terminated string here.

Sure I do - by specifying a destination buffer larger than the source.

You were essentially saying "strncpy is more optimal than strncat
because strncpy is a library function".

I said no such thing. Learn to read. Here, I'll quote it for you:

"...but as a library function this one will, presumably, be "more
optimal" than a loop..."

You do see the word "presumably" in there, correct? As in it is generally
presumed that the library implementors, knowing more about the specifics
of the system(s) the library is implemented on, will often produce code
considerably more efficient than the result of a typical loop construct
will be, and at the worst will presumably be no less efficient.

It's a pretty common presumption, that the library implementers are
neither screaming morons nor isolated from system-specific information
which we, as developers, cannot rely upon in order to gain efficiency.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,091
Messages
2,570,605
Members
47,225
Latest member
DarrinWhit

Latest Threads

Top