Qry : Behaviour of fgets -- ?

Flash Gordon · Sep 13, 2007

Ben Bacarisse wrote, On 13/09/07 17:57:

Well, count me as in support then.

I'm explicitly NOT saying how is correct and who isn't, because I've not
followed who thinks what carefully enough. Instead I'm going to express
my opinions on some of the points Ben is raising.

I did not say anything before
because RW was arguing perfectly adequately without my muddying the
waters but I, too, dislike any operation description of UB.

Because I have a liking for denotational semantics, given a C program
X which has some UB elements in it, I think E[X] = _|_ [1]. It is
worrying that, for some beginners, this fact (that their program has
no meaning) is not distressing enough, but attaching fanciful stories
to _|_ (especially ones to which experience will soon attribute low
probabilities) makes it seem *less* serious in my book.

<snip>

Personally I *like* the non-serious examples of "possible" results of
undefined behaviour which are obviously things that will not happen on
real-world implementations (the reasons being outside the C standard).
The reasons I like them are as follows:

If it is obviously something that will not happen in the real world
people will not mistake it for a statement of what is actually
guaranteed to happen on some implementation. It also illustrates that,
as far as the definition[1] of C is concerned it is perfectly acceptable
for things to happen that the questioner would not think of. It can be
fun (at least for me) thinking up new and (possibly) amusing
illustrations. Some of the suggestions that others come up with can be
amusing to me.

[1] The questioner might not know about the C standard but they will
know that the language is defined in some manner, even if they think
that their implementation defines it. Most people here obviously know
that the C standard defines the current language, and has at least since
C89 was pretty much universally implemented.

Please note that I've cut the cross-post since I'm talking about what
goes on in comp.lang.c and I am pretty sure the person I'm replying to
reads comp.lang.c.

Chris Dollin · Sep 13, 2007

Keith said:
Behavior can occur without being defined. Stuff happens.

You've managed to make a claim to which there are infinitely many
counterexamples.

I don't know that it's /infinitely/ many. The universe may be
finite. Even if it isn't, we don't have access to more than
a finite amount of it.

Certainly there are a /great many/ counterexamples.

Chris Dollin · Sep 13, 2007

Ben said:
Because I have a liking for denotational semantics, given a C program
X which has some UB elements in it, I think E[X] = _|_ [1].

UB isn't bottom -- that's a different sense of undefined.

(To see why, consider our old friend `i = i++;`. If UB was bottom, then
this expression would compute bottom; it would not terminate; any
implementation that gave any value to `i` and continued would be
non-conformant. To capture C's notion of UB, where /any/ behaviour
is legal, I think you'd need to do something different again, perhaps
represent answers as sets of values, typical with one element, "the"
value, but possible several, eg capturing different evaluation orders.
Then an actual implementation would be legal if it computed any of the
answers in the set, and UB would allow the set to contain all possible
values /of the allowed types/.

Which, since demons aren't in the sets used to describe C, means that
UB cannot result in nasal demons, just as DB /can/ -- because demons,
nasal or otherwise, are not part of /the C abstract machine/. No?
)

CBFalconer · Sep 13, 2007

Ben said:
Well, count me as in support then. I did not say anything before
because RW was arguing perfectly adequately without my muddying the
waters but I, too, dislike any operation description of UB.

Well, I wrote the piece that started RWs railing. It did not
define 'undefined behaviour'. It did define what 'undefined
behaviour on that particular machine under those particular
circumstances' did. Note the utter lack of requirement for any
other machine to do the same under the same circumstances.

Douglas A. Gwyn · Sep 13, 2007

Keith said:
Any 'restrict' qualifiers in the definition of memcpy() are meaningful
(*if* memcpy() is implemented in C), but are not necessarily available
to the user.

Right. By including them in the "Synopsis" in the man page,
or equivalent, that information is also made avaialble to the
programmer, although it is not enforced. It's one of many
matters for which C relies on the programmer to use things
properly.

Allowing 'const' qualifiers on pointer parameters to be inconsistent
between a function declaration and its definition makes some sense, I
suppose, since they affect only the implementation of the function;
they say something about the parameter object, which is local to the
body of the function, and there's no reason for the caller to care
about it.

I'm not sure anybody uses "const" at the outer level for
function parameters in prototypes. It wouldn't convey
anything useful for the programmer to know.

"const" is however very useful at the next level, and has
special meaning -- it does *not* mean that the type of the
argument has to be const-qualified at that level, but rather
that no data shall be modified by dereferncing any pointer
value that is based on that argument.

Allowing 'restrict' qualifiers to be inconsistent seems problematic,
since they're saying something about the pointed-to objects, not just
about the pointers, and that's information that could be useful to the
caller.

It is somewhat analogous to "const" at the next-to-outer
level: no modifications shall be made to any object using
a pointer value based on a different (restrict-qualified)
pointer argument.

The "const" restriction is imposed on the function, while
the "restrict" restriction is imposed on the caller.

One could imagine all sorts of other ways to embed such
restrictions into C source code, but those are the onees
we came up with back when decisions had to be made.

Douglas A. Gwyn · Sep 13, 2007

<[email protected]> <[email protected]> <[email protected]>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Douglas A. Gwyn said:
On a word-addressed machine where more than one char is packed
into a word, there *must* be a byte selector. Some implementations
have used the low-order bits for this, necessitating shifting to
convert between pointer types, while others have used the high-
order bits. In the latter case, if the byte selector is not
all-zero, most likely an invalid-address trap will occur if the
pointer value is used as a word address.

Also, in the former case, which also applies to most byte-
addressable platforms, an invalid-address trap is common if
the low-order bits are not zero when accessing a wider type.
(I just tried it on a SPARC Ultra-1 and got a SIGBUS fault.)

Douglas A. Gwyn · Sep 13, 2007

Douglas A. Gwyn said:
Also, in the former case, which also applies to most byte-
addressable platforms, an invalid-address trap is common if
the low-order bits are not zero when accessing a wider type.
(I just tried it on a SPARC Ultra-1 and got a SIGBUS fault.)

To further follow up on this: On DEC's VAX architecture,
which was byte addressable, one could access any hardware-
supported width of integer on an arbitrary byte boundary
(I don't recall whetherthis was true for floating-point);
however, due to the memory caching scheme, it tended to
really slow down the instructions. I recall at least one
early VAX C compiler tried to tightly pack structures,
but was changed to align the data (i.e. use padding) in
order to speed up accesses of structures and structure
members.

Flash Gordon · Sep 13, 2007

Chris Dollin wrote, On 13/09/07 21:16:

Ben said:
Ben said:

Because I have a liking for denotational semantics, given a C program
X which has some UB elements in it, I think E[X] = _|_ [1].

Click to expand...

UB isn't bottom -- that's a different sense of undefined.

(To see why, consider our old friend `i = i++;`. If UB was bottom, then
this expression would compute bottom; it would not terminate; any
implementation that gave any value to `i` and continued would be
non-conformant. To capture C's notion of UB, where /any/ behaviour
is legal, I think you'd need to do something different again, perhaps
represent answers as sets of values, typical with one element, "the"
value, but possible several, eg capturing different evaluation orders.
Then an actual implementation would be legal if it computed any of the
answers in the set, and UB would allow the set to contain all possible
values /of the allowed types/.

That set is not large enough. i=i++ could cause a bus-clash which is not
trapped by the HW (because the HW leave two instructions writing to the
same location in parallel undefined) and sometimes the bus-clash could
lead to overheating causing the computer to emit smoke and cease to
operate as a computer. There *have* been computers where doing the wrong
thing really *could* cause part of the computer to overheat and emit
smoke, whilst i=i++ would be unlikely to cause it, other example of UB
*could* cause it.

Which, since demons aren't in the sets used to describe C, means that
UB cannot result in nasal demons, just as DB /can/ -- because demons,
nasal or otherwise, are not part of /the C abstract machine/. No?
)

A bus clash wrecking the computer is not defined by the C abstract
machine, but it can still happen on some *real* HW that has been sold
commercially to home users. I remember one computer where the manual
even stated that you should be careful writing to the memory location
that controlled memory paging *because* it could cause a bus clash and
wreck the machine! From memory, I C something like "*(unsigned
char*)0xFFFF=0xFF" would be most likely to do this. I don't know if
there was a C compiler available for the computer, but it is likely.

kuyper · Sep 13, 2007

Rainer said:
Correct. But that is not a definition of 'undefined behaviour' (even

It's marked by the standard as a definition.

trying to do so would quite obviously contradict the term) but what

Standardese is a modified form of English. The rules governing the
modification are simple: if the standard explicitly provides a
definition for a word or a phrase, that is the meaning of that word or
phrase within the context of the standard, regardless of what those
words might mean when interpreted as ordinary English. Such a
definition is inherently incapable of contradicting the term it
defines, because that definition overrides anything that you might
think it contradicts.

the phrase 'undefined behaviour' is supposed to mean: Behaviour which
is not defined by the standard. Specifically, it is neither defined as
random nor as arbitrary but NOT defined.

Insofar as you see "random" or "arbitrary" as being in conflict with
"undefined behavior", then you misinterpreting the intent of the
standard's definition of that term, whether or not that intent was
clearly expressed. The term "undefined behavior" was specifically
invented with the intent of allowing the behavior to be random, or
arbitrary, or convenient for the implementation. It was deliberately
intended to impose NO RESTRICTIONS on the implementation when
translating such code. It was not intended to restrict implementations
from letting the behavior be random. It was not intended to restrict
implementations from arbitrarily choosing what the behavior would be.
It was not even intended for the standard to restrict implementations
from choosing the behavior maliciously - that's more properly a matter
of QoI, or in extreme cases, product liability law, it's not the
domain of the standard to impose such a restriction.

I can't see how you can interpret that definition as prohibiting the
use of random, arbitrary, or even malicious methods for choosing what
the actual behavior is. But if you're right, it needs to be re-
written.

The text was "There is no such thing as a form of undefined behaviour
because it is undefined." And it was written in response to your claim
that the standard would specify 'forms of the undefined', in other
words impose, requirements on it.

No, I never said that the standard would specify the forms, or impose
requirements on them. What I've said is that it is perfectly
reasonable, and accurate, to explain the complete and total lack of
restrictions on "undefined behavior" by providing extreme examples of
behavior which are permitted by that lack of restrictions.

These are your words and not mine. Consequently, this is a claim you
made and not a claim I made.

You said there was no such thing as a form of undefined behavior. I'm
just pointing out the ridiculous consequence of that claim.

Logically, it is not relevant how many people are convinced of
something.

True, but as a practical matter, the number of domain experts that
express disagreement with you about a matter which is within their
domain of expertise should be taken as evidence that you might have
made a mistake. It's perfectly possible to be one of the few who
correctly understands something. However, it is unlikely, and you
should always be a little bit suspicious of any chain of thought that
requires you to believe that this is the case.

The issue is perfectly clear.

That

i = 3;
i = ++i;

means 'the value of i is now 3' is wrong, because the meaning of this
statement is undefined. 'i has now the value of 4' is wrong for the
same reason. Stating that it means that 'the front door should open'
is still wrong. Same reason again

Up to here, we are in full agreement.

... And this can be continued for every
possible statement about anything: There is no statement which is
supposed to be true, provided the reference for determining what
exactly is true or false is the C-standard, which, for this particular
instance, does not define that.

And that is where your argument drops off the deep end, leaving me no
hint as to how you reached this conclusion, and therefore no way to
argue against it.

"This code means that the value of i to 3" is certainly false.

However, the statement "This code sets the value i to 3" is either
true or false, depending upon the implementation. That's what "no
restrictions" means: it can be either true or false. It doesn't mean
that the statement is meaningless, only that that meaning is not
determined by the C standard. It might be determined by any number of
other things, including a specific definition of the behavior by that
particular implementation.

Finally, the statement "As far as the standard is concerned, this code
might set the value of i to 3" is true, and it's true precisely
because the standard imposes no restrictions on the behavior - in
particular, the standard does not impose a restriction preventing 'i'
from being set to 3. The statement "As far as the standard is
concerned, this code might set the value of i to 3" has essentially
the same meaning as "The standard imposes no restrictions preventing
the value of i from being 3", which is inherently implied by the
statement "The standard imposes no restrictions on the behavior of
this code".

This is completely orthogonal to the question what conceivable
behaviour of C-implementations when processing this piece of non-C
could be. And C is again very simple in this respect: Whatever the
behaviour may be, it does not matter. No reason to discuss it.

There are two reasons to discuss it: to answer a question about
whether a given behavior is allowed (when the behavior is undefined,
the answer is always "yes"), and as a didactic tool for impressing
students with just how big a risk they take everytime they execute
code with undefined behavior. Giving humorous and extreme examples is
helpful for making the lesson more memorable, and is technically
correct, even if those examples are less likely than the more boring
examples you might prefer to use.

Keith Thompson · Sep 13, 2007

Douglas A. Gwyn said:
Right. By including them in the "Synopsis" in the man page,
or equivalent, that information is also made avaialble to the
programmer, although it is not enforced. It's one of many
matters for which C relies on the programmer to use things
properly.

[...]

Ok, but I find this particular instance of that (having a keyword in a
delaration that really has no more meaning than a comment, even though
there *could* be an obvious meaning for it) to be a bit too subtle.
If something is meaningless, it doesn't belong in the language. Just
my opinion. Allowing

void func(int array[42]);

is equally misleading.

Suppressing my natural humility for just a moment (yeah, I know), if
*I* was confused by this, I suspect others are as well.

The "obvious meaning" I'm referring to is that

void *memcpy(void * restrict s1,
const void * restrict s2,
size_t n);

could tell the compiler that this call:

memcpy(buf, buf+1, 42);

invokes undefined behavior, and possibly elicit a warning or even
cause the translation unit to be rejected. Or it could *assume*, in
more subtle cases, that the actual parameters point to non-overlapping
memory and perhaps perform some optimization based on that. (The
compiler is allowed to use its knowledge of standard functions, but
consider a similar user-written function.)

I think I would have preferred it if the qualifiers in a function
declaration and in the corresponding function definition were required
to be identical. With such a rule, the compiler would know from
seeing the qualifiers on the declaration that the parameters are
'restrict'ed, and might be able to act on that knowledge. I don't see
the benefit of allowing the qualifiers to differ.

Another alternative, which would keep the current semantics, would be
to forbid top-level qualifiers on parameter declarations that are not
part of a function definition. They're meaningless anyway, so why
allow them? You can always insert comments.

I'm not suggesting that either change should be made; it would break
existing code.

Kelsey Bjarnason · Sep 13, 2007

Casper H.S. Dik said:
Casper H.S. Dik said:

Ben Bacarisse said:

Army1987 <[email protected]> writes:

Click to expand...

On Sat, 08 Sep 2007 21:48:19 +0200, Army1987 wrote:

]To find out
which we use strchr searching for '\0' which will always succeed. If the
NUL is at buff[size-1] we assume the line is longer than buff.

Doesn't strlen(buf) != size - 1 do the same without looking that
weird?
Or better, set buff[size - 1] to a nonzero value, call fgets, and
check whether buff[size - 1] is zero. This takes O(1) time.

Click to expand...

This is a neat solution because it also works in the peculiar case of
a line with embedded nulls.

Click to expand...

But does the standard restrict writes to the part of the buffer
where no data was read, i.e., is a standard conforming
implementation allowed to start fgets with:

memset(s, '\0', n);
....

Click to expand...

I think not, though there is wriggle room. There is no wriggle room
in the case of no data being read: "[if] no characters have been read
into the array, the contents of the array remain unchanged" so fgets
can't start that way.

Reading one character and then filling with nulls might, just, pass
the other wording but the description is mechanical enough to suggest
the very minimal tampering with the buffer is expected:

"The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into
the array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array."

Expected, but not required. As far as the wording is concerned, fgets
could read n-1 characters, tack on a \0, then fill any remaining space in
the buffer with random values - it is not actually reading additional
characters, and it wrote a null after the last one read, so it is, by this
text, conforming.

Keith Thompson · Sep 13, 2007

Chris Dollin said:
Ben said:

Because I have a liking for denotational semantics, given a C program
X which has some UB elements in it, I think E[X] = _|_ [1].

Click to expand...

UB isn't bottom -- that's a different sense of undefined.

(To see why, consider our old friend `i = i++;`. If UB was bottom, then
this expression would compute bottom; it would not terminate; any
implementation that gave any value to `i` and continued would be
non-conformant. To capture C's notion of UB, where /any/ behaviour
is legal, I think you'd need to do something different again, perhaps
represent answers as sets of values, typical with one element, "the"
value, but possible several, eg capturing different evaluation orders.
Then an actual implementation would be legal if it computed any of the
answers in the set, and UB would allow the set to contain all possible
values /of the allowed types/.

Which, since demons aren't in the sets used to describe C, means that
UB cannot result in nasal demons, just as DB /can/ -- because demons,
nasal or otherwise, are not part of /the C abstract machine/. No?
)

What you're describing is the case where the *result* of the
expression (not sure whether this would be the value of 'i' after the
statement executes, or the result of the expression (which is
discarded)) is undefined. But it's not just the value of 'i' that's
undefined, it's the *behavior*. This could include modifying the
value of some unrelated object (which is well within the C abstract
machine) or doing something physically nasty (which C programs are
certainly able to do if they have the required I/O interface).

Kenneth Brody · Sep 14, 2007

Chris said:
I don't know that it's /infinitely/ many. The universe may be
finite. Even if it isn't, we don't have access to more than
a finite amount of it.

Certainly there are a /great many/ counterexamples.

So you're saying that there is a /finite/ number of counterexamples?
Are you saying that it is, in theory, possible to enumerate them all?
I doubt that such a "complete" list would be "complete", and that
given such a list, someone somewhere could give an additional example.
And, once that was added to the "well, now it's complete" list, yet
another counterexample could be added.

For proof, I will simply point out that, regardless of the size of
the list, one can always add "both A and B occur", or "you end up at
the midpoint of A and B", where A and B are two items from the
current list.

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>

Ben Bacarisse · Sep 14, 2007

Chris Dollin said:
Ben said:

Because I have a liking for denotational semantics, given a C program
X which has some UB elements in it, I think E[X] = _|_ [1].

Click to expand...

UB isn't bottom -- that's a different sense of undefined.

Not entirely different. I know it is somewhat different which is why
I only said that this is how I *think* about UB. The standard says
what UB means (not what programs having UB mean, of course, just what
the term means) and I am not suggesting an alternative. I am saying
how I think of it and why I think that is scary enough.

(To see why, consider our old friend `i = i++;`. If UB was bottom, then
this expression would compute bottom; it would not terminate; any
implementation that gave any value to `i` and continued would be
non-conformant.

E[X] = _|_ does not (always) mean X does not terminate. In the lambda
calculus, non-termination is the most common way to get _|_, but
there are others (E["tail []"] = _|_ in most semantics, for example).

To see what it means in the fullest sense, one has to see what you do
without it. Formulations of denotation semantics without _|_ are
possible but they reply heavily on the theory of partial functions.
In other words the "meaning" functions end up having nothing to say at
all about certain programs. _|_ is "thereof we must be silent" in set
theory. It makes all the functions total by having a symbol for
silence.

To capture C's notion of UB, where /any/ behaviour
is legal, I think you'd need to do something different again, perhaps
represent answers as sets of values, typical with one element, "the"
value, but possible several, eg capturing different evaluation orders.
Then an actual implementation would be legal if it computed any of the
answers in the set, and UB would allow the set to contain all possible
values /of the allowed types/.

Which, since demons aren't in the sets used to describe C, means that
UB cannot result in nasal demons, just as DB /can/ -- because demons,
nasal or otherwise, are not part of /the C abstract machine/. No?
)

This is exactly the problem. If you try to specify "anything" you
have to specify the universal set for the domain of discourse. What
is the total set of behaviours?[1]

If you include just those reachable through some C abstract machine
(no matter how non-deterministic) you will end up specifying the UB.
OK, it will be a very lax spec (maybe even "all program objects and
streams will be in an indeterminate state") but you will not satisfy
the nasal fans -- although I'd be quite happy with it.

If you include even demonic effects, then since you are giving an
operational answer to a question about what the program might *do*,
the student is entitled to ask "how?". You are not obliged to answer,
or course, but I think the effect is weaker than "your program has no
meaning".

You can phrased it "your program has no defined meaning" or "the C
standard does not say what this program means/does" or just "this is
UB" and these answers come up all the time in c.l.c. I am fine with
that. I just have an aversion to shining any light at all into the
set of possibilities, because that seems to me to diminish it. There
is nothing so vast as a vast darkness without even a pin-point of
light. That is what _|_ is to me.[2]

[1] I think the standard does this to some extent already. The output
of 'printf("%ux\n", (unsigned)-1);' is not exactly specified by the
standard but neither is it unconstrained. There is a well-specified
set of strings that can result (assume the required #includes and so
on to avoid UB, of course).

[2] Jokes about vast bottoms are too obvious to entertain.

Richard Heathfield · Sep 14, 2007

Kenneth Brody said:

So you're saying that there is a /finite/ number of counterexamples?
Are you saying that it is, in theory, possible to enumerate them all?

The two statements are not equivalent. There could be finitely many
counterexamples even though it is not possible even in theory to
enumerate them all. If the number of counterexamples exceeds the volume
of the universe (measured in multiples of the smallest volume in which
one such counterexample can be enumerated) multiplied by the duration
of the universe (measured in multiples of the smallest time it takes to
enumerate one counterexample), then it is indeed not possible even in
theory to enumerate them all, any more than one can in theory squeeze a
quart of water into a pint pot.

I doubt that such a "complete" list would be "complete", and that
given such a list, someone somewhere could give an additional example.
And, once that was added to the "well, now it's complete" list, yet
another counterexample could be added.

Oh, for sure. On the one hand, it could, for example, print 1. On the
other hand, it could print 2. On the other other hand, it could print
3. Or 4, perhaps. Or maybe 5? Etc.

Giorgos Keramidas · Sep 14, 2007

Douglas A. Gwyn said:
Douglas A. Gwyn said:

Right. By including them in the "Synopsis" in the man page,
or equivalent, that information is also made avaialble to the
programmer, although it is not enforced. It's one of many
matters for which C relies on the programmer to use things
properly.

Click to expand...

[...]

Ok, but I find this particular instance of that (having a keyword in a
delaration that really has no more meaning than a comment, even though
there *could* be an obvious meaning for it) to be a bit too subtle.
Indeed.

Suppressing my natural humility for just a moment (yeah, I know), if
*I* was confused by this, I suspect others are as well.

I was confused right about until the time I read Douglas' explanation.

I guess it's back to the reading room for me

I think I would have preferred it if the qualifiers in a function
declaration and in the corresponding function definition were required
to be identical. With such a rule, the compiler would know from
seeing the qualifiers on the declaration that the parameters are
'restrict'ed, and might be able to act on that knowledge. I don't see
the benefit of allowing the qualifiers to differ. [...]
I'm not suggesting that either change should be made; it would break
existing code.

I think I agree with that pattern of thought. At least it could make
qualifiers more useful to the compiler; but that's not going to change
easily and I'm not suggesting such a change either.

Ben Bacarisse · Sep 14, 2007

Kelsey Bjarnason said:
Casper H.S. Dik said:

On Sat, 08 Sep 2007 21:48:19 +0200, Army1987 wrote:

]To find out
which we use strchr searching for '\0' which will always succeed. If the
NUL is at buff[size-1] we assume the line is longer than buff.

Doesn't strlen(buf) != size - 1 do the same without looking that
weird?
Or better, set buff[size - 1] to a nonzero value, call fgets, and
check whether buff[size - 1] is zero. This takes O(1) time.

This is a neat solution because it also works in the peculiar case of
a line with embedded nulls.

But does the standard restrict writes to the part of the buffer
where no data was read, i.e., is a standard conforming
implementation allowed to start fgets with:

memset(s, '\0', n);
....

Click to expand...

I think not, though there is wriggle room. There is no wriggle room
in the case of no data being read: "[if] no characters have been read
into the array, the contents of the array remain unchanged" so fgets
can't start that way.

Reading one character and then filling with nulls might, just, pass
the other wording but the description is mechanical enough to suggest
the very minimal tampering with the buffer is expected:

"The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into
the array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array."

Click to expand...

Expected, but not required. As far as the wording is concerned, fgets
could read n-1 characters, tack on a \0, then fill any remaining space in
the buffer with random values - it is not actually reading additional
characters, and it wrote a null after the last one read, so it is, by this
text, conforming.

I said "I think not, though there is wriggle room". I should have
said "There is wriggle room so I think yes". It is silly, but wriggle
room is wriggle room.

The same argument means the following are UB:

char buf[] = "hello";
strncpy(buf, "j", 5);
buf[1] = 'e';
puts(buf);

char buf[2];
strncpy(buf, "x", 20);

I am not too bothered about that (though I admit I had not thought
about it until now!) but I'd advocate a change to fgets to state that
it modifies no more than the bytes it reads (plus the null) so that
one can easily do such tests.

Wojtek Lerch · Sep 14, 2007

....

The "obvious meaning" I'm referring to is that

void *memcpy(void * restrict s1,
const void * restrict s2,
size_t n);

could tell the compiler that this call:

memcpy(buf, buf+1, 42);

invokes undefined behavior [...] (The
compiler is allowed to use its knowledge of standard functions, but
consider a similar user-written function.)

Well, exactly: consider this user-written function:

void *notquitememcpy(void * restrict s1,
const void * restrict s2,
size_t n) {
return NULL;
}

and this call:

notquitememcpy(buf, buf+1, 42);

Just because you're passing the same pointer to two restrict-qualified
parameters of a function doesn't necessarily cause undefined behaviour -- it
depends on what the function does with them.

I think I would have preferred it if the qualifiers in a function
declaration and in the corresponding function definition were required
to be identical. With such a rule, the compiler would know from
seeing the qualifiers on the declaration that the parameters are
'restrict'ed, and might be able to act on that knowledge. I don't see
the benefit of allowing the qualifiers to differ.

The benefit is that it lets you define a function with const-qualified
parameters without having to commit to their constness in the public
prototype in the header.

Another alternative, which would keep the current semantics, would be
to forbid top-level qualifiers on parameter declarations that are not
part of a function definition. They're meaningless anyway, so why
allow them? You can always insert comments.

I imagine that might complicate programs that parse C code and extract
function prototypes from it.

Golden California Girls · Sep 14, 2007

Rainer said:
kuyper said:

Rainer said:

This text would fit into a newsgroup whose topic is production of
mildly creative absurd fiction or one discussing quirks of a
particular implementation of C BUT NOT into a discussion of C, because
it is only insofar related to it as it talks about something the
C-standard does not talk about. And 'C' is the set of things the
C-standard DOES talk about.

Click to expand...

[...]

But it is NOT pointless to illustrute what "undefined behavior"
means by giving extreme examples.

Click to expand...

It is not possible to illustrate 'nothing' by providing examples of
it.

NOP

Richard Harter · Sep 14, 2007

Yes; there have been architectures (typically 16-bit word ones)
where type char * or void * required multiple words but other
pointers required only a single word. Also, the conversion
from byte address to word address is likely to force the "must
be zero" bits to actually be zero even when all object pointers
have the same size, although since it is in the realm of
undefined behavior, the implementation may opt to not worry
about them (i.e. will assume that they are already zero).

Another oddball architecture was early Primos C. It had (IIRC) 32
bit words, 32 bit longs and ints, 32 bit int * pointers, and 48
bit char * pointers.

Anyone can explain the meaning of following info	1	Mar 21, 2007
Segmentation fault in vsnprintf() from /lib64/tls/libc.so.6	4	Dec 4, 2006
pointer from integer?	4	Apr 30, 2009
Segfault in vgetargskeywords	0	Sep 15, 2008
Replacing fgets	32	Sep 17, 2006
Help troubleshooting Perl issue	8	Apr 14, 2010
Question about fprintf	3	Aug 18, 2008
Debugging core file?	2	Nov 3, 2007

Qry : Behaviour of fgets -- ?

Flash Gordon

Chris Dollin

Chris Dollin

CBFalconer

Douglas A. Gwyn

Douglas A. Gwyn

Douglas A. Gwyn

Flash Gordon

kuyper

Keith Thompson

Kelsey Bjarnason

Keith Thompson

Kenneth Brody

Ben Bacarisse

Richard Heathfield

Giorgos Keramidas

Ben Bacarisse

Wojtek Lerch

Golden California Girls

Richard Harter

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads