Qry : Behaviour of fgets -- ?

C

Chris Dollin

Ben said:
Chris Dollin said:
Ben said:
Because I have a liking for denotational semantics, given a C program
X which has some UB elements in it, I think E[X] = _|_ [1].

UB isn't bottom -- that's a different sense of undefined.

Not entirely different. I know it is somewhat different which is why
I only said that this is how I *think* about UB. The standard says
what UB means (not what programs having UB mean, of course, just what
the term means) and I am not suggesting an alternative. I am saying
how I think of it and why I think that is scary enough.
(To see why, consider our old friend `i = i++;`. If UB was bottom, then
this expression would compute bottom; it would not terminate; any
implementation that gave any value to `i` and continued would be
non-conformant.

E[X] = _|_ does not (always) mean X does not terminate. In the lambda
calculus, non-termination is the most common way to get _|_, but
there are others (E["tail []"] = _|_ in most semantics, for example).

I'd have thought that in /most/ semantics, `tail []` is an error value
or invokes a continuation.

Machines implement bottom as non-termination; I'm almost completely
sure of this, but it has been a long time.
To see what it means in the fullest sense, one has to see what you do
without it. Formulations of denotation semantics without _|_ are
possible but they reply heavily on the theory of partial functions.
In other words the "meaning" functions end up having nothing to say at
all about certain programs. _|_ is "thereof we must be silent" in set
theory. It makes all the functions total by having a symbol for
silence.

Yes. (It does more than this, of course, but it does /at least/ that.)
To capture C's notion of UB, where /any/ behaviour
is legal, I think you'd need to do something different again, perhaps
represent answers as sets of values, typical with one element, "the"
value, but possible several, eg capturing different evaluation orders.
Then an actual implementation would be legal if it computed any of the
answers in the set, and UB would allow the set to contain all possible
values /of the allowed types/.

Which, since demons aren't in the sets used to describe C, means that
UB cannot result in nasal demons, just as DB /can/ -- because demons,
nasal or otherwise, are not part of /the C abstract machine/. No?
)

This is exactly the problem. If you try to specify "anything" you
have to specify the universal set for the domain of discourse. What
is the total set of behaviours?[1]

Whatever the domain of answers says it is: you can choose.

The C standard prescribes the behaviour of the C abstract machine.
Behaviour outside that machine is invisible to it. Invisible behaviour
is not constrained by the standard at all.
If you include just those reachable through some C abstract machine
(no matter how non-deterministic) you will end up specifying the UB.
OK, it will be a very lax spec (maybe even "all program objects and
streams will be in an indeterminate state")

In my formulation, it would be "all states are in the set of allowable
states". I'd have to go and digest the standard to see if "indeterminate"
was like a value or a constraint on a value.
but you will not satisfy the nasal fans -- although I'd be quite happy
with it.

I'm beginning to believe that nasality, being invisible to the standard,
isn't a true (even if good) capturing of UB!

Consider this question: which part of the standard prohibits `i += 1;`
from causing a demon to fly out of your nose?
 
C

Chris Dollin

Flash said:
Chris Dollin wrote, On 13/09/07 21:16:
Ben said:
Because I have a liking for denotational semantics, given a C program
X which has some UB elements in it, I think E[X] = _|_ [1].

UB isn't bottom -- that's a different sense of undefined.

(To see why, consider our old friend `i = i++;`. If UB was bottom, then
this expression would compute bottom; it would not terminate; any
implementation that gave any value to `i` and continued would be
non-conformant. To capture C's notion of UB, where /any/ behaviour
is legal, I think you'd need to do something different again, perhaps
represent answers as sets of values, typical with one element, "the"
value, but possible several, eg capturing different evaluation orders.
Then an actual implementation would be legal if it computed any of the
answers in the set, and UB would allow the set to contain all possible
values /of the allowed types/.

That set is not large enough. i=i++ could cause a bus-clash which is not
trapped by the HW (because the HW leave two instructions writing to the
same location in parallel undefined) and sometimes the bus-clash could
lead to overheating causing the computer to emit smoke and cease to
operate as a computer.

But that behaviour is outside the C virtual machine; the standard
can't constrain it anyway.
 
C

Chris Dollin

Keith said:
Chris Dollin said:
Ben said:
Because I have a liking for denotational semantics, given a C program
X which has some UB elements in it, I think E[X] = _|_ [1].

UB isn't bottom -- that's a different sense of undefined.

(To see why, consider our old friend `i = i++;`. If UB was bottom, then
this expression would compute bottom; it would not terminate; any
implementation that gave any value to `i` and continued would be
non-conformant. To capture C's notion of UB, where /any/ behaviour
is legal, I think you'd need to do something different again, perhaps
represent answers as sets of values, typical with one element, "the"
value, but possible several, eg capturing different evaluation orders.
Then an actual implementation would be legal if it computed any of the
answers in the set, and UB would allow the set to contain all possible
values /of the allowed types/.

Which, since demons aren't in the sets used to describe C, means that
UB cannot result in nasal demons, just as DB /can/ -- because demons,
nasal or otherwise, are not part of /the C abstract machine/. No?
)

What you're describing is the case where the *result* of the
expression (not sure whether this would be the value of 'i' after the
statement executes, or the result of the expression (which is
discarded)) is undefined. But it's not just the value of 'i' that's
undefined, it's the *behavior*. This could include modifying the
value of some unrelated object

That's covered: the "answers" above can include the complete state
description, so UB would allow any state whatsoever.
(which is well within the C abstract
machine) or doing something physically nasty (which C programs are
certainly able to do if they have the required I/O interface).

That stuff isn't constrained by the standard anyway. I'm coming
to believe it's merely QoI that prevents `i += 1;` from sending
rude email to one's boss.
 
C

Chris Dollin

Kenneth said:
Chris said:
Keith said:
[...]
There is no such thing as a form of undefined behaviour because it
is undefined. If there was, it would be defined.

Behavior can occur without being defined. Stuff happens.

You've managed to make a claim to which there are infinitely many
counterexamples.

I don't know that it's /infinitely/ many. The universe may be
finite. Even if it isn't, we don't have access to more than
a finite amount of it.

Certainly there are a /great many/ counterexamples.

So you're saying that there is a /finite/ number of counterexamples?

If the accessible universe is finite, yes.
Are you saying that it is, in theory, possible to enumerate them all?

Yes. Enumerate all possible universe states. The number of counterexamples
is no bigger than this.

Oops: I've an implicit assumption that states are discrete.

I /think/ I could weasel that into the finiteness, but I won't.
I doubt that such a "complete" list would be "complete", and that
given such a list, someone somewhere could give an additional example.
And, once that was added to the "well, now it's complete" list, yet
another counterexample could be added.

For proof, I will simply point out that, regardless of the size of
the list, one can always add "both A and B occur", or "you end up at
the midpoint of A and B", where A and B are two items from the
current list.

Discreteness kills that, I think.
 
C

Charlie Gordon

Ben Bacarisse said:
The same argument means the following are UB:

char buf[] = "hello";
strncpy(buf, "j", 5);
buf[1] = 'e';
puts(buf);

Why do you think the above code should invoke UB ?
It should just output j e and a new-line on stdout
char buf[2];
strncpy(buf, "x", 20);

This one is definitely UB
I am not too bothered about that (though I admit I had not thought
about it until now!)

You do not seem to know the broken semantics of strncpy.
I wouldn't be surprised, very few programmers do.
That's why strncpy should NEVER be used, even in the unlikely cases where it
does exactly what is needed because using it propagates dangerous
misunderstandings on the overwhelming majority on programmers.
but I'd advocate a change to fgets to state that
it modifies no more than the bytes it reads (plus the null) so that
one can easily do such tests.

I agree with you, I doubt many C libraries behave differently, except in
case of a read error, which is explicitly addressed by the Standard already.
 
C

Casper H.S. Dik

Charlie Gordon said:
That's why strncpy should NEVER be used, even in the unlikely cases where it
does exactly what is needed because using it propagates dangerous
misunderstandings on the overwhelming majority on programmers.

I beg to differ; while strncpy() is weird it was designed for a
single purpose: putting a possibly non-NUL terminated sequence of
characters in a limited size char[] struct field such as those
used in UNIX utmp/wtmp files. For that purpose it is perfect and
it feels wrong to re-implement strncpy() under a different name.

Casper
 
R

Richard

Charlie Gordon said:
Ben Bacarisse said:
The same argument means the following are UB:

char buf[] = "hello";
strncpy(buf, "j", 5);
buf[1] = 'e';
puts(buf);

Why do you think the above code should invoke UB ?
It should just output j e and a new-line on stdout
char buf[2];
strncpy(buf, "x", 20);

This one is definitely UB
I am not too bothered about that (though I admit I had not thought
about it until now!)

You do not seem to know the broken semantics of strncpy.
I wouldn't be surprised, very few programmers do.
That's why strncpy should NEVER be used, even in the unlikely cases where it
does exactly what is needed because using it propagates dangerous
misunderstandings on the overwhelming majority on programmers.

You need to run this past me again. Why shouldn't people who don't know
what they are doing not use it?

It seems easy enough to use correctly to me.
 
F

Francis Glassborow

Ben said:
You can phrased it "your program has no defined meaning" or "the C
standard does not say what this program means/does" or just "this is
UB" and these answers come up all the time in c.l.c.

The meaning of the program is unimportant in this context, its behaviour
is. IOWs what it does matters. The point of UB is that executing a
program with UB can result in anything. The Abstract machine is
irrelevant because the program is no longer C and so is outside the
requirements of the Abstract Machine.

Yes we do tend to be a bit jokey when giving examples of 'anything' but
UB behaviour can be pretty drastic (three examples I know of: 1) setting
a monitor on fire -- seriously though this would not happen to modern
monitors there once was one that would overheat and eventually ignite if
its scan rate was set to zero. 2) reprogramming a graphics card
randomly, took me many hours to restore the graphics card. 3) Randomly
reprogramming a programmable keyboard -- that one was really nasty as
the keyboard had condenser backed storage of its settings that ensured
that the current state was secure for almost a year :-(
 
C

Chris Dollin

Francis said:
The meaning of the program is unimportant in this context, its behaviour
is.

The meaning of a[n imperative] program /is/ it's behaviour -- the
changes to the state it produces.
IOWs what it does matters. The point of UB is that executing a
program with UB can result in anything.

Executing a program /without/ UB can result in anything, too. The
standard is only talking about the behaviour of the C abstract
machine. The bits of an implementation that aren't about the C
abstract machine are /already/ unconstrained by the standard.

This isn't what I used to think, but absent a good counter-argument,
it's what I'm thinking now.
The Abstract machine is irrelevant because the program is no longer
C and so is outside the requirements of the Abstract Machine.

I don't think the standard is in a position to say that.
Yes we do tend to be a bit jokey when giving examples of 'anything' but
UB behaviour can be pretty drastic (three examples I know of: 1) setting
a monitor on fire -- seriously though this would not happen to modern
monitors there once was one that would overheat and eventually ignite if
its scan rate was set to zero. 2) reprogramming a graphics card
randomly, took me many hours to restore the graphics card. 3) Randomly
reprogramming a programmable keyboard -- that one was really nasty as
the keyboard had condenser backed storage of its settings that ensured
that the current state was secure for almost a year :-(

What part of the Standard prevents those things from happening when
a conforming program executes -- better, which stop an implementation
that does these things from being a conformant implementation?
 
K

Kelsey Bjarnason

[snips]

You need to run this past me again. Why shouldn't people who don't know
what they are doing not use it?

You mean why _should_ they not use it? Because people who don't know what
they're doing write bad and often dangerous code. Give 'em a spoon
instead. :)

As to why he's suggesting not using it, if I had to guess I'd say
something along the lines of unlike pretty much every other string
function, this one has a nasty tendency to produce non-null-terminated
character arrays; if coders aren't aware of this or fail to pay attention
to it, bad things can happen.
 
R

Richard

Kelsey Bjarnason said:
[snips]

You need to run this past me again. Why shouldn't people who don't know
what they are doing not use it?

You mean why _should_ they not use it? Because people who don't know
what

I was making a joke and I think,yes, there is one too many
negatives.. You see, if people dont know how to use an API then they,
well, shouldn't. It is obvious. But they can learn by doing and with
judicous test cases, and debug cycles they will learn to use it
properly.

they're doing write bad and often dangerous code. Give 'em a spoon
instead. :)

or, tell them to read the API.
As to why he's suggesting not using it, if I had to guess I'd say
something along the lines of unlike pretty much every other string
function, this one has a nasty tendency to produce non-null-terminated
character arrays; if coders aren't aware of this or fail to pay attention
to it, bad things can happen.

As can i++ if you dont keep an eye on the index limit for an array.

It is bogus advice.

There is *nothing* wrong with strncpy and it is used in millions of
lines of code. In many ways you could say its safer to use since it wont
overwrite memory if the source string is bad.

strtok, on the other hand .....
 
C

Charlie Gordon

Casper H.S. Dik said:
Charlie Gordon said:
That's why strncpy should NEVER be used, even in the unlikely cases where
it
does exactly what is needed because using it propagates dangerous
misunderstandings on the overwhelming majority on programmers.

I beg to differ; while strncpy() is weird it was designed for a
single purpose: putting a possibly non-NUL terminated sequence of
characters in a limited size char[] struct field such as those
used in UNIX utmp/wtmp files. For that purpose it is perfect and
it feels wrong to re-implement strncpy() under a different name.

I gave the rationale for this: if the savvy 1% programmers who know well
keep using it even just for this purpose, the 99% remaining will continue
thinking they know what it does and use it improperly all the time and
produce flawed code everywhere, including life support equipment and
aircraft navigation systems.

The argument is very much the same against gets and sprintf.

Even when dealing with Unix utmp and wtmp structures, it would be better to
write explicit code to deal with these error prone structures, such code is
quite easy to re-write anyway.

How can we defend the need for a function with so little value in the C
library when we do not even have strdup?
Having strncpy causes thousands of bugs.
Not having strdup has the same effect (people will call malloc(strlen(str)))

Fixing these 2 would make the language safer.
 
D

Douglas A. Gwyn

Ben said:
E[X] = _|_ does not (always) mean X does not terminate. ...

It is worth noting that in the context of the C standard,
a program that causes an instance of "undefined behavior"
actually might not proceed any further; one possibility
is that the code enters an infinite loop. I can imagine
this happening for architectures like some early
minicomputers I programmed, which would automatically
"chain" an indirect address when the target address's
high-order bit was set. In a nearby thread there was
discussion of how word-addressed machines would use byte
selector fields in their char-pointer representations,
and if the high bit were a byte selector (as it would
probably have been for C on one of those minicomputers)
and an odd address were converted to point to a word-
sized object without masking off the byte selector (as
would be probable for those systems also), then
dereferencing via that pointer would indirect an
additional time due to the high bit being set, and if
the accidental target had its high bit set this would
continue recursively, perhaps entering a cycle.
 
D

Douglas A. Gwyn

Chris said:
What part of the Standard prevents those things from happening when
a conforming program executes -- better, which stop an implementation
that does these things from being a conformant implementation?

"Conforming program" is a term of little value, and was defined
in the C standard for political reasons. A "strictly conforming
program", however, does not trigger undefined behavior.

"Conforming programs" on "conforming implementations" can do
all sorts of things beyond what the C standard specifies,
including operations that make monitors catch on fire, etc.

Francis's point was that such consequences actually do occur
sometimes when "undefined behavior" is triggered.
 
D

Douglas A. Gwyn

Keith said:
Ok, but I find this particular instance of that (having a keyword in a
delaration that really has no more meaning than a comment, even though
there *could* be an obvious meaning for it) to be a bit too subtle.
If something is meaningless, it doesn't belong in the language. ...

Well, it isn't meaningless, but it doesn't affect code generation.

There is a proposal before the C standards committee to adopt
somethings like Microsoft's __declspec facility for annotating
C source code with "attributes" that are outside the scope of
the current C standard. I suspect every experienced C
programmer has occasionally thought that there ought to be
somethings of the sort. Recall /*NOTREACHED*/ to avoid a
spurious warning from "lint"? There is a __declspec attribute
that has the same meaning.

GCC also has similar extensions.
I think I would have preferred it if the qualifiers in a function
declaration and in the corresponding function definition were required
to be identical.

Yes, that might have been better.
 
C

Charlie Gordon

Richard said:
Kelsey Bjarnason said:
[snips]

You need to run this past me again. Why shouldn't people who don't know
what they are doing not use it?

You mean why _should_ they not use it? Because people who don't know
what

I was making a joke and I think,yes, there is one too many
negatives.. You see, if people dont know how to use an API then they,
well, shouldn't. It is obvious. But they can learn by doing and with
judicous test cases, and debug cycles they will learn to use it
properly.

The situation is much worse than you make it sound: they don't know the
semantics, but they think they do. They write code that seems to work as
they expect but contains bugs waiting to bite... just like gets and sprintf.
or, tell them to read the API.

I agree: show them the API as an explanation for why they should not use
strncpy.
As can i++ if you dont keep an eye on the index limit for an array.

It is bogus advice.

There is *nothing* wrong with strncpy and it is used in millions of
lines of code. In many ways you could say its safer to use since it wont
overwrite memory if the source string is bad.

to reach millions, you need to count the binaries produced.

You cannot decently say that there be *nothing* wrong with it: its precise
semantics make it almost useless, are conter-intuitive and are vastly
misunderstood, and misused...
strtok, on the other hand .....

Why did C99 not include reentrant versions of strtok and friends baffles me.
 
R

Rainer Weikusat

Chris Dollin said:
Yes, there is. /Every/ program behaviour is an example of undefined
behaviour (as it is defined by the C standard).

That's a claim which should be easy to prove: Assuming you are not
deliberatetly misunderstanding me, you appear to claim that 'undefined
behavious' is actually behaviour defined by the C-standard. Since 'no
requirements' isn't a definition of any specific behaviour, this
definition or set of definitions must be in some other location, so
please cite where the standard provides a definition of the allowed
semantics for undefined behaviour.

NB: A positive definition is required.
The standard puts constraints on the behaviour of a program: it
says certain things are not permitted (and consequently that
others are required). In the specific case of "undefined behaviour",
there are /no constraints applied/, so all behaviours are
acceptable.

More correctly (as I have written several times now): There is no such
thing as 'acceptable behaviour' for these case (provided you cannot
magically come up with a positive definition of this beahviour)
because no acceptable behaviour is defined. Presumably, a real-world
implementation of C will have 'some behaviour' in such situation and
whatever this might be is not relevant for determining the standard
conformance of this implementation.
You're confusing "undefined behaviour" (behaviour which isn't defined by
anything anywhere) and "undefined behaviour" (behaviour on which the
standard places no requirements).

Assuming that the C-standard is relevant for defining the C language,
it is the only location where 'behaviour' can be defined and you are
basically presenting a false dichotomy: There is no such 'other
location'.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,098
Messages
2,570,624
Members
47,236
Latest member
EverestNero

Latest Threads

Top