Error in scanf implementation or error in example in standard?

S

Simon Biber

The following Example 3 is given in the 1999 C standard for the function
fscanf:
EXAMPLE 3 To accept repeatedly from stdin a quantity, a unit of
measure, and an item name:

#include <stdio.h>
/* ... */
int count; float quant; char units[21], item[21];
do {
count = fscanf(stdin, "%f%20s of %20s", &quant, units, item);
fscanf(stdin,"%*[^\n]");
} while (!feof(stdin) && !ferror(stdin));

If the stdin stream contains the following lines:

2 quarts of oil
-12.8degrees Celsius
lots of luck
10.0LBS of
dirt
100ergs of energy

the execution of the above example will be analogous to the following
assignments:

quant = 2; strcpy(units, "quarts"); strcpy(item, "oil");
count = 3;
quant = -12.8; strcpy(units, "degrees");
count = 2; // "C" fails to match "o"
count = 0; // "l" fails to match "%f"
quant = 10.0; strcpy(units, "LBS"); strcpy(item, "dirt");
count = 3;
count = 0; // "100e" fails to match "%f"
count = EOF;

I have tested several implementations and none of them get the last case
right. In no case does fscanf return 0 indicating failure to match
"100ergs of energy" with "%f".

The actual behaviour varies. Some will match '100', leaving the 'e' unread:

quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
count = 3;

While others will match '100e', leaving the 'r' unread:

quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
count = 3;

But I am yet to come across an implementation that does what the example
in the Standard specifies. Is this a failure in the implementations or
in the Standard itself?
 
R

Robert Gamble

Simon said:
The following Example 3 is given in the 1999 C standard for the function
fscanf:
EXAMPLE 3 To accept repeatedly from stdin a quantity, a unit of
measure, and an item name:

#include <stdio.h>
/* ... */
int count; float quant; char units[21], item[21];
do {
count = fscanf(stdin, "%f%20s of %20s", &quant, units, item);
fscanf(stdin,"%*[^\n]");
} while (!feof(stdin) && !ferror(stdin));

If the stdin stream contains the following lines:

2 quarts of oil
-12.8degrees Celsius
lots of luck
10.0LBS of
dirt
100ergs of energy

the execution of the above example will be analogous to the following
assignments:

quant = 2; strcpy(units, "quarts"); strcpy(item, "oil");
count = 3;
quant = -12.8; strcpy(units, "degrees");
count = 2; // "C" fails to match "o"
count = 0; // "l" fails to match "%f"
quant = 10.0; strcpy(units, "LBS"); strcpy(item, "dirt");
count = 3;
count = 0; // "100e" fails to match "%f"
count = EOF;

I have tested several implementations and none of them get the last case
right. In no case does fscanf return 0 indicating failure to match
"100ergs of energy" with "%f".

The actual behaviour varies. Some will match '100', leaving the 'e' unread:

quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
count = 3;

While others will match '100e', leaving the 'r' unread:

quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
count = 3;

But I am yet to come across an implementation that does what the example
in the Standard specifies. Is this a failure in the implementations or
in the Standard itself?

Footnote 245 in n1124 states:
"fscanf pushes back at most one input character onto the input stream.
Therefore, some sequences that are acceptable to strtod, strtol, etc.,
are unacceptable to fscanf."

This was added in response to Defect Report #22:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_022.html.

In the case of 100ergs, fscanf reads up to the r before realizing that
the "e" is not part of the number but at that point, given the one
character pushback limit, it can no longer push back both the r and the
e so it has to return with a failure since 100e is not a valid number.
Many implementations allow more than one character pushback and take
advantage of this fact in the fscanf function, hence the behavior you
have seen. Technically such implementations are in violation of the
Standard but the sentiment among many implementors is that the
requirement is unjustified and they just live with non-conformance.

Robert Gamble
 
R

Richard Heathfield

Robert Gamble said:

Many implementations allow more than one character pushback and take
advantage of this fact in the fscanf function, hence the behavior you
have seen. Technically such implementations are in violation of the
Standard

Why?
 
R

Richard Bos

Robert Gamble said:
Footnote 245 in n1124 states:
"fscanf pushes back at most one input character onto the input stream.
Therefore, some sequences that are acceptable to strtod, strtol, etc.,
are unacceptable to fscanf."

True, but feetneet are not normative. Strictly speaking, there's a
conflict between two parts of the Standard; the footnote makes it clear
that in this case, the intent was that the part about a single character
pushback buffer for input streams overrides the part about parsing
numbers, but it would be better if that were made explicit in the
_normative_ text in the next TC.

Richard
 
R

Robert Gamble

Richard said:
Robert Gamble said:



Why?

Why what? Why such implementations aren't technically conforming?
Because implementations that push back more than one character in the
fscanf family of functions do not behave as mandated by the Standard.
I am not sure I understand your point, perhaps you could clarify with a
multi-word response.

Robert Gamble
 
B

Ben Pfaff

Robert Gamble said:
Many implementations allow more than one character pushback and take
advantage of this fact in the fscanf function, hence the behavior you
have seen. Technically such implementations are in violation of the
Standard but the sentiment among many implementors is that the
requirement is unjustified and they just live with non-conformance.

C99 says this in the description of the ungetc function:

One character of pushback is guaranteed. If the ungetc
function is called too many times on the same stream without
an intervening read or file positioning operation on that
stream, the operation may fail.

I don't see a requirement that *only* one character of pushback
be supported, only that *at least* one character of pushback be
supported.

On the other hand, perhaps you are talking about the following
text and footnote for the fscanf function; your article seems
ambiguous to me:

An input item is read from the stream, unless the specification
includes an n specifier. An input item is defined as the
longest sequence of input characters which does not exceed
any specified field width and which is, or is a prefix of, a
matching input sequence.242)

242) fscanf pushes back at most one input character onto the
input stream. Therefore, some sequences that are
acceptable to strtod, strtol, etc., are unacceptable
to fscanf.
 
R

Robert Gamble

Richard said:
True, but feetneet are not normative.

And neither are the examples for that matter.
Strictly speaking, there's a
conflict between two parts of the Standard; the footnote makes it clear
that in this case, the intent was that the part about a single character
pushback buffer for input streams overrides the part about parsing
numbers, but it would be better if that were made explicit in the
_normative_ text in the next TC.

I certainly agree that it would have been nice if this footnote was
part of the normative text, I don't know why it isn't. The only
conflict I see is the one in the C90 Standard which was addressed in DR
022. Although the footnote is non-normative, it along with the example
and the fact that it was the result of a DR make it abundantly clear
what the intent was. If intent isn't enough though, a careful reading
of the normative changes made in the DR (which were carried through to
C99) yield the same result even if not as clearly spelled out.

Robert Gamble
 
R

Richard Heathfield

Robert Gamble said:
Why what? Why such implementations aren't technically conforming?
Yes.

Because implementations that push back more than one character in the
fscanf family of functions do not behave as mandated by the Standard.

Why not?
I am not sure I understand your point, perhaps you could clarify with a
multi-word response.

<grin> Okay, let me see if I can make it clearer. Maybe you're right that
providing more than the minimum level of pushback is against the rules, and
maybe you're not. I can see why an implementation *must* provide at least
one character of pushback, but where is it *forbidden* from providing more?
 
R

Robert Gamble

Ben said:
C99 says this in the description of the ungetc function:

One character of pushback is guaranteed. If the ungetc
function is called too many times on the same stream without
an intervening read or file positioning operation on that
stream, the operation may fail.

I don't see a requirement that *only* one character of pushback
be supported, only that *at least* one character of pushback be
supported.

I was speaking specifically of the pushback used by the fscanf function
which I thought was clear based on the footnote that I cited. I
certainly did not mean to imply that multi-character pushback was
itself incorrect, just its use in the fscanf function.
On the other hand, perhaps you are talking about the following
text and footnote for the fscanf function; your article seems
ambiguous to me:

An input item is read from the stream, unless the specification
includes an n specifier. An input item is defined as the
longest sequence of input characters which does not exceed
any specified field width and which is, or is a prefix of, a
matching input sequence.242)

242) fscanf pushes back at most one input character onto the
input stream. Therefore, some sequences that are
acceptable to strtod, strtol, etc., are unacceptable
to fscanf.

Right, I cited this exact footnote at the beginning of my original
article, perhaps your missed it.

Robert Gamble
 
R

Robert Gamble

Richard said:
Robert Gamble said:


Why not?


<grin> Okay, let me see if I can make it clearer. Maybe you're right that
providing more than the minimum level of pushback is against the rules, and
maybe you're not. I can see why an implementation *must* provide at least
one character of pushback, but where is it *forbidden* from providing more?

First let me make clear that I am speaking only of the pushback
functionality used within the fscanf function itself, not the pushback
capability of a stream in general (which can provide pushback for as
many characters as it desires), at least one person seems to have been
confused by my original statement. The Standard makes it clear through
the discussed footnote and example that the behavior shall be as if a
maximum of one character of pushback was used within the fscanf
function ("fscanf pushes back at most one input character onto the
input stream"). Although footnotes and examples are non-normative, the
same meaning is supported by the normative changes that were provoked
by DR 022:

In subclause 7.9.6.2, page 135, lines 31-33, change:

"An input item is defined as the longest matching sequence of input
characters, unless that exceeds a specified field width, in which case
it is the initial subsequence of that length in the sequence."

to:

"An input item is defined as the longest sequence of input characters
which does not exceed any specified field width and which is, or is a
prefix of, a matching input sequence."

Robert Gamble
 
R

Richard Heathfield

Robert Gamble said:
The Standard makes it clear through
the discussed footnote and example that the behavior shall be as if a
maximum of one character of pushback was used within the fscanf
function ("fscanf pushes back at most one input character onto the
input stream").

Thank you for clarifying. I know you know that footn...
Although footnotes and examples are non-normative,

....er, quite so.
the
same meaning is supported by the normative changes that were provoked
by DR 022:

I've found DRs 200 through 294. I can't find DR 022.
 
B

Ben Pfaff

Robert Gamble said:
On the other hand, perhaps you are talking about the following
text and footnote for the fscanf function; your article seems
ambiguous to me:
[...]

Right, I cited this exact footnote at the beginning of my original
article, perhaps your missed it.

I did miss it, sorry.
 
R

Richard Heathfield

Robert Gamble said:
Richard Heathfield wrote:

The link was in my original response:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_022.html.

My apologies for missing that. It does appear that the text under
consideration is still non-normative. (It's footnote 245 in n1124, for
those who don't know).

Having said that, I accept that the intent of footnotes, despite their
non-normative status, is to clarify the meaning of the Standard, so I'll
shut up now.

(Like I care ***so much*** about fscanf! :) )
 
S

Simon Biber

Robert said:
Footnote 245 in n1124 states:
"fscanf pushes back at most one input character onto the input stream.
Therefore, some sequences that are acceptable to strtod, strtol, etc.,
are unacceptable to fscanf."

This was added in response to Defect Report #22:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_022.html.

In the case of 100ergs, fscanf reads up to the r before realizing that
the "e" is not part of the number but at that point, given the one
character pushback limit, it can no longer push back both the r and the
e so it has to return with a failure since 100e is not a valid number.

But none of the implementations I tested actually return with a failure!

Try it -- whether on Solaris, Linux, Cygwin, DJGPP, Microsoft VC++,
LCC-Win32 or Turbo C, none of them return with a failure. They interpret
100e as a valid number, with the value 100.

That's the real bug, not the quibble on how many characters are pushed back.
 
R

Robert Gamble

Simon said:
But none of the implementations I tested actually return with a failure!

Try it -- whether on Solaris, Linux, Cygwin, DJGPP, Microsoft VC++,
LCC-Win32 or Turbo C, none of them return with a failure. They interpret
100e as a valid number, with the value 100.

That's the real bug, not the quibble on how many characters are pushed back.

There are 2 problems here. Implementations that convert 100 and leave
the "e" on the stream are probably realizing that the "e" is not part
of the number when it reads the "r" and are pushing back too many
characters. Implementations that convert 100 and leave the "r" as the
first character on the stream are incorrectly accepting "100e" as
equivalent to "100e1". glibc is known to accept certain invalid
numeric sequences but they don't seem willing to acknowledge such
problems.

I tested a number of implementations a while ago and had the same
results that you have seen. I believe the that at least the Solaris
and glibc folk are aware of this particular issue but they don't seem
to have any plans to change their behavior. I believe that uClibc
(http://uclibc.org/) handled this case correctly, but I'm not positive.

I haven't tried this on Dinkumware as I don't have access to it but if
this was going to be handled correctly on any implementation it would
probably be the Dinkumware C99 library. Their library claims to be
certified by Perennial as C99-compliant and I believe the behavior in
question is tested in the certification process. If anyone has access
to this library it would be nice if they could confirm how it handles
the this. Additionally, if it does handle this correctly, I would be
curious to know if the same string is handled the same way with the
sscanf function (I believe it should but some people do not, the
Standard isn't crystal clear in my opinion).

Robert Gamble
 
P

P.J. Plauger

There are 2 problems here. Implementations that convert 100 and leave
the "e" on the stream are probably realizing that the "e" is not part
of the number when it reads the "r" and are pushing back too many
characters. Implementations that convert 100 and leave the "r" as the
first character on the stream are incorrectly accepting "100e" as
equivalent to "100e1". glibc is known to accept certain invalid
numeric sequences but they don't seem willing to acknowledge such
problems.

I tested a number of implementations a while ago and had the same
results that you have seen. I believe the that at least the Solaris
and glibc folk are aware of this particular issue but they don't seem
to have any plans to change their behavior. I believe that uClibc
(http://uclibc.org/) handled this case correctly, but I'm not positive.

I haven't tried this on Dinkumware as I don't have access to it but if
this was going to be handled correctly on any implementation it would
probably be the Dinkumware C99 library. Their library claims to be
certified by Perennial as C99-compliant and I believe the behavior in
question is tested in the certification process. If anyone has access
to this library it would be nice if they could confirm how it handles
the this. Additionally, if it does handle this correctly, I would be
curious to know if the same string is handled the same way with the
sscanf function (I believe it should but some people do not, the
Standard isn't crystal clear in my opinion).

We do it right (if only to score 100 per cent on the Perennial C99
validation suite), where by "right" I mean what the DR tells us
to do -- consume "100e", fail, and leave "r" in the input stream.
We do the same for both scanf and sscanf, since the code is common.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
R

Robert Gamble

P.J. Plauger said:
We do it right (if only to score 100 per cent on the Perennial C99
validation suite), where by "right" I mean what the DR tells us
to do -- consume "100e", fail, and leave "r" in the input stream.
We do the same for both scanf and sscanf, since the code is common.

Thanks very much for the input. I sense from you the same sentiment
that I have seen expressed from other implementors, that the one
character max pushback mandate isn't well-received. Although the
Rationale doesn't provide any insight as to why this decision was made
I would assume it would be to support implementations that only provide
a single character pushback while keeping results consistent among
implementations that could provide more. Do you feel that there is a
better way to handle this, has there been any discussion on changing
this behavior in the Standard, and is this a common sentiment in your
experience?

Robert Gamble
 
R

Random832

2006-11-30 said:
There are 2 problems here. Implementations that convert 100 and leave
the "e" on the stream are probably realizing that the "e" is not part
of the number when it reads the "r" and are pushing back too many
characters. Implementations that convert 100 and leave the "r" as the
first character on the stream are incorrectly accepting "100e" as
equivalent to "100e1".

100e0, actually - which it's arguable* that it in fact is equivalent.

* Arguable. adj. That for which "one would be wrong, but one could argue it."
 
C

CBFalconer

.... snip about parsing "100ergs" as a real ...
We do it right (if only to score 100 per cent on the Perennial C99
validation suite), where by "right" I mean what the DR tells us
to do -- consume "100e", fail, and leave "r" in the input stream.
We do the same for both scanf and sscanf, since the code is common.

Which makes sense, especially if you consider the spec as reading
"stop on the first character that cannot describe a real". It also
makes sense if you conceive of an empty field as describing zero.
This more or less agrees with the standard (at least N869):

[#4] If the subject sequence has the expected form for a
floating-point number, the sequence of characters starting
with the first digit or the decimal-point character
(whichever occurs first) is interpreted as a floating
constant according to the rules of 6.4.4.2, except that the
decimal-point character is used in place of a period, and |
that if neither an exponent part nor a decimal-point |
character appears in a decimal floating point number, or if |
a binary exponent part does not appear in a binary floating |
point number, an exponent part of the appropriate type with |
value zero is assumed to follow the last digit in the |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
string. If the subject sequence begins with a minus sign, |
the sequence is interpreted as negated.235) A character
sequence INF or INFINITY is interpreted as an infinity, if
representable in the return type, else like a floating
constant that is too large for the range of the return type.
A character sequence NAN or NAN(n-char-sequence-opt), is
interpreted as a quiet NaN, if supported in the return type,
else like a subject sequence part that does not have the
expected form; the meaning of the n-char sequences is
implementation-defined.236) A pointer to the final string *
is stored in the object pointed to by endptr, provided that
endptr is not a null pointer.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,743
Latest member
WoodrowMea

Latest Threads

Top