scanf()

  • Thread starter Edward Rutherford
  • Start date
M

Malcolm McLean

בת×ריך ×™×•× ×¨×שון, 12 ב×וגוסט 2012 01:30:46 UTC+1, מ×ת Keith Thompson:
That's absurd. Undefined behavior means that whatever the *worst*
thing is, it can happen, whether that's crashing the program, or
continuing to execute quietly with bad data, or reformatting your
hard drive.
It mean the implementation decides what the result of the operation will be..
Of course the implementation is free to choose the worst option, like
reformatting the hard drive. But normally it's choice between terminating,
with an erro message, continuing with a corrupt value, or continuing with
a NaN or infinity. Not all platforms support option 3, which is the probable
best choice.
 
J

James Kuyper

בת×ריך ×™×•× ×¨×שון, 12 ב×וגוסט 2012 01:30:46 UTC+1, מ×ת Keith Thompson:
It mean the implementation decides what the result of the operation will be.

Keep in mind that the simplest way for the implementor to deal with that
issue is to ignore it: the implementor may simply write the code be as
fast as possible while meeting it's requirements, of which there are
none that apply when the input is 1.0e9999999999. What happens when that
code deals with such inputs need not be anything that the implementor
bothered thinking about, it's just whatever happens when the assumptions
that went into the design of the code are violated. Plausible
possibilities (not having given any detailed thought to how the code is
likely to be implemented), include silently corrupting the contents of
memory locations that should not be affected by a call to sscanf() with
valid inputs, or getting stuck in an infinite loop.
Of course the implementation is free to choose the worst option, like
reformatting the hard drive. But normally it's choice between terminating,
with an erro message, continuing with a corrupt value, or continuing with
a NaN or infinity. Not all platforms support option 3, which is the probable
best choice.

I would favor terminating with an error message over either of the other
two options you mention. I've heard people claim in similar discussion
in the past that there exist programs where it's more important that a
program keep running, regardless of how badly it is malfunctioning, than
that it halt. They say that in certain safety critical contexts, if the
program halts, someone will die. However, it seems to me that in
precisely those same circumstances, if the program malfunctions badly
enough, someone will die anyway, even if it never halts.
 
M

Malcolm McLean

בת×ריך ×™×•× ×¨×שון, 12 ב×וגוסט 2012 13:03:23 UTC+1, מ×ת James Kuyper:
On 08/12/2012 07:07 AM, Malcolm McLean wrote:

I would favor terminating with an error message over either of the other
two options you mention. I've heard people claim in similar discussion
in the past that there exist programs where it's more important that a
program keep running, regardless of how badly it is malfunctioning, than
that it halt. They say that in certain safety critical contexts, if the
program halts, someone will die. However, it seems to me that in
precisely those same circumstances, if the program malfunctions badly
enough, someone will die anyway, even if it never halts.
A safety critical system in ahospital life support system should crash out on
error. That;s no different to the electricty failing, and the hospital should
have procedures to deal with that eventuality.
However a safety critcal system on an aircraft may need to keep going. There
might be no plausible fallback mechanism. So you've just got to process the
wrogn results and hope they are not sufficiently wrong to bring the plane
down. The same's true of video games. Stopping the game is usually the worst
thing you can do. Making a baddie appear in an unusual place might be accepted
as part of the game.
 
K

Keith Thompson

Malcolm McLean said:
בת×ריך ×™×•× ×¨×שון, 12 ב×וגוסט 2012 01:30:46 UTC+1, מ×ת Keith Thompson:
It mean the implementation decides what the result of the operation will be.

No, the implementation needn't decide *anything*.

[...]
 
M

Malcolm McLean

בת×ריך ×™×•× ×¨×שון, 12 ב×וגוסט 2012 21:39:54 UTC+1, מ×ת Keith Thompson:
No, the implementation needn't decide *anything*.
Someone writes scanf(). It's possible, though unlikely for someone given
such a job, that he's never considered that exponents might be out of
range. If he has considered that possibility, he must decide what to do
with such input. He might write an error message, quietly return a NaN, or
return a corrupted result which retain backwards comptability with a previous
version of the function. There's a case for all of these. But he's got to
opt for something.
 
K

Keith Thompson

Malcolm McLean said:
בת×ריך ×™×•× ×¨×שון, 12 ב×וגוסט 2012 21:39:54 UTC+1, מ×ת Keith Thompson:
Someone writes scanf(). It's possible, though unlikely for someone given
such a job, that he's never considered that exponents might be out of
range. If he has considered that possibility, he must decide what to do
with such input. He might write an error message, quietly return a NaN, or
return a corrupted result which retain backwards comptability with a previous
version of the function. There's a case for all of these. But he's got to
opt for something.

As you acknowledge, the developer needn't even consider the
possibility that exponents might be out of range -- or that there
might be a sufficiently long sequence of digits to produce an
out-of-range value without using an exponent.

Someone implementing scanf() might simply perform a series of
arithmetic operations that will work correctly with in-range
values, and *assume* that it will fail in some clean fashion with
out-of-range values. He might fail to consider what could happen
with all the compilers under which the code might run (run-time
libraries aren't necessarily tied to specific compilers), or how
optimization might affect the results. Adding code to allow for
overflow might significantly slow down cases that *don't* overflow;
he might deliberately choose performance over clean behavior for
undefined cases. Or he might simply take the standard's statement
that the behavior is undefined as a license not to worry about it.

Treating undefined behavior as if it were merely unspecified or
implementation-defined is foolish.
 
S

Seungbeom Kim

בת×ריך ×™×•× ×©×‘×ª, 11 ב×וגוסט 2012 18:40:56 UTC+1, מ×ת Keith Thompson:
But what is the program meant to do with such an input value? UB is probably the
best thing that can happen to it.

The best that can happen is for the library function (sscanf here)
to report an error to its caller in a well-defined way and let the
program stay in a well-defined state.

Look what the strto* functions do, for example: they return HUGE_VAL*
or *LONG_{MIN,MAX} and set errno to ERANGE. The caller can examine
these conditions and decide what to do in case of an error.

The problem with this kind of UB caused by bad user input, contrary to
another kind of UB caused by bad programmer input, is that there's no
defense for the program. Calling strlen with a null pointer is UB, but
it's a bug the programmer can fix/prevent/get around. It's not the case
with calling sscanf with a bad string or calling gets with a long string,
and that's why these functions are inherently unsafe.
 
H

Heinrich Wolf

בת×ריך ×™×•× ×¨×שון, 12 ב×וגוסט 2012 21:39:54 UTC+1, מ×ת Keith Thompson:
No, the implementation needn't decide *anything*.
Someone writes scanf(). It's possible, though unlikely for someone given
such a job, that he's never considered that exponents might be out of
range. If he has considered that possibility, he must decide what to do
with such input. He might write an error message, quietly return a NaN, or
return a corrupted result which retain backwards comptability with a
previous
version of the function. There's a case for all of these. But he's got to
opt for something.

--------------------------------------------------
I tried to read "3.1415e999999" with scanf and with strtod on 3 different
compilers. All of them say they have processed 13 characters and scanf
always returns the same as strtod. My very old Turbo C 2.0 however does a
bad job and returns in 1.798e+308. My Borland C++Builder 5 and my gcc on
Fedora 14 return +INF. Still strtod is better than scanf, because I have the
chance to go back the stream as far as I have loaded it into my buffer and
interpret it differently.
 
P

Phil Carmody

Heinrich Wolf said:
Newsbeitrag
úøéê éåí øùåï, 12 áàåâñè 2012 21:39:54 UTC+1, îú Keith Thompson:
Someone writes scanf(). It's possible, though unlikely for someone given
such a job, that he's never considered that exponents might be out of
range. If he has considered that possibility, he must decide what to do
^^

with such input. He might write an error message, quietly return a NaN, or
return a corrupted result which retain backwards comptability with a
previous
version of the function. There's a case for all of these. But he's got to
opt for something.

Unless he hasn't. See the "if" above, and presume falsity of the predicate.

Phil
--
I'd argue that there is much evidence for the existence of a God.
Pics or it didn't happen.
-- Tom (/. uid 822)
 
T

Tim Rentsch

Keith Thompson said:
Malcolm McLean said:
Keith Thompson:
But what is the program meant to do with such an input value? UB is probably the
best thing that can happen to it.

That's absurd. Undefined behavior means that whatever the *worst*
thing is, it can happen, whether that's crashing the program, or
continuing to execute quietly with bad data, or reformatting your
hard drive. [snip elaboration]

It doesn't. I admit that's often a useful way of thinking about
undefined behavior, but that isn't what the term actually means
(and I'm sure I'm not saying anything you don't already know).
When the Standard says "undefined behavior", that means it
imposes no requirements: as far as the Standard is concerned,
anything goes and everything is acceptable.

But, the Standard isn't the only force operating here. There is
also the compiler that was used; the operating system within
which the program is being run; possibly multiple levels of
those if the program is being run in a virtual environment; the
physical hardware that the program is being run on; and, going
all the way to the bottom, the physical universe we all live in.
Any one of those levels could serve to limit or define the set of
potential behaviors available.

So, at the same time that I agree that thinking of undefined
behavior as meaning "anything can happen", I would argue that it's
also important to keep in mind the actual implications of undefined
behavior, which can be a positive in some circumstances. I might
not have chosen his phrasing, but the basic point Malcolm is making
is essentially correct - "undefined behavior" is never really
"undefined", but rather defined by elements and forces outside the
sphere of what the Standard chooses to address. Not knowing what
defines the behavior or what the definition is doesn't mean they
don't exist.
 
K

Keith Thompson

Tim Rentsch said:
Keith Thompson said:
Malcolm McLean said:
Keith Thompson:

There are worse possibilities than wrong results. If the input includes
something like "1.0e999999999", the behavior is undefined.

But what is the program meant to do with such an input value? UB is probably the
best thing that can happen to it.

That's absurd. Undefined behavior means that whatever the *worst*
thing is, it can happen, whether that's crashing the program, or
continuing to execute quietly with bad data, or reformatting your
hard drive. [snip elaboration]

It doesn't. I admit that's often a useful way of thinking about
undefined behavior, but that isn't what the term actually means
(and I'm sure I'm not saying anything you don't already know).
When the Standard says "undefined behavior", that means it
imposes no requirements: as far as the Standard is concerned,
anything goes and everything is acceptable.

But, the Standard isn't the only force operating here. There is
also the compiler that was used; the operating system within
which the program is being run; possibly multiple levels of
those if the program is being run in a virtual environment; the
physical hardware that the program is being run on; and, going
all the way to the bottom, the physical universe we all live in.
Any one of those levels could serve to limit or define the set of
potential behaviors available.

So, at the same time that I agree that thinking of undefined
behavior as meaning "anything can happen", I would argue that it's
also important to keep in mind the actual implications of undefined
behavior, which can be a positive in some circumstances. I might
not have chosen his phrasing, but the basic point Malcolm is making
is essentially correct - "undefined behavior" is never really
"undefined", but rather defined by elements and forces outside the
sphere of what the Standard chooses to address. Not knowing what
defines the behavior or what the definition is doesn't mean they
don't exist.

Yes, but the consequences can still be *really really bad*.

For any behavior that a computer system can possibly exhibit,
within the obvious limitations imposed by physics, logic, and the
design and construction of that system, it's possible in principle
for a C program whose behavior is undefined to exhibit that behavior.

Examples: If your operating system contains code that can reformat
a hard drive, and the system's protections are imperfect, it's
entirely possible (though admittedly unlikely) that undefined
behavior could cause your hard drive to be reformatted. Consider a
function pointer whose value is corrupted, so it points to the code
in the `reformat_hard_drive()` routine just *after* the "Are you
sure?" prompt.

Compiler optimizations that *assume* defined behavior are a rich
source of unexpectedly bad consequences of undefined behaivor.

And deliberate exploitation of UB by malware can also have nasty
consequences.
 
T

Tim Rentsch

Keith Thompson said:
Tim Rentsch said:
Keith Thompson said:
[snipped to the bone]
Undefined behavior means that whatever the *worst*
thing is, it can happen, whether that's crashing the program, or
continuing to execute quietly with bad data, or reformatting your
hard drive. [snip elaboration]

It doesn't. I admit that's often a useful way of thinking about
undefined behavior, but that isn't what the term actually means
(and I'm sure I'm not saying anything you don't already know).
When the Standard says "undefined behavior", that means it
imposes no requirements: as far as the Standard is concerned,
anything goes and everything is acceptable. [snip]

Yes, but the consequences can still be *really really bad*.
[snip elaboration]

I agree, they can be. I still think it's better to label undefined
behavior for what it is, and not for something else that's
convenient as an explanatory device but ultimately misleading about
what the term actually means. In the context of educating someone
about undefined behavior, adopting the convenient shorthand can be
very helpful, and I'm all for that; even then, however, I think the
explanatory device should be followed by an asterisk, with a more
faithful explanation of the true definition. Make sense?
 
K

Keith Thompson

Tim Rentsch said:
Yes, but the consequences can still be *really really bad*.
[snip elaboration]

I agree, they can be. I still think it's better to label undefined
behavior for what it is, and not for something else that's
convenient as an explanatory device but ultimately misleading about
what the term actually means. In the context of educating someone
about undefined behavior, adopting the convenient shorthand can be
very helpful, and I'm all for that; even then, however, I think the
explanatory device should be followed by an asterisk, with a more
faithful explanation of the true definition. Make sense?

Yes.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,077
Messages
2,570,569
Members
47,206
Latest member
MalorieSte

Latest Threads

Top