Given the following
#include <stdio.h>
int main(void)
{
char line[BUFSIZ];
long arg1, arg2;
while(fgets(line, BUFSIZ, stdin) != NULL){
if(sscanf(line,"%ld%ld",&arg1,&arg2) == 2){
snprintf(line, sizeof(line), "%ld\n", arg1 + arg2);
if(fputs(line, stdout) == EOF){
fprintf(stderr, "output error\n");
}
You could just printf() the result, instead of snprintf()-ing and
then fputs()-ing. But this code is all correct, at least.
}
else{
fprintf(stderr, "invalid input\n");
}
}
return 0;
}
When I run it, I get the following
[cdalten@localhost oakland]$ ./add
4 5
9
65
invalid input
Now, here's the question. How come there has to be a space between the
numbers? sscanf() in this case doesn't even have a space in the format
args.
Most people seem to misunderstand how the scanf engine works.
The scanf family of functions all use a common "engine" to do
their work. This "scanf engine" is fairly simple -- one might
even say "simplistic" -- and simply executes "directives" in a
sequence, one after another, until one of them "fails" or the
engine runs out of directives, whichever occurs first.
Directives consist of literal text, white space, or "conversions"
introduced by "%". They are given by the "format arg" (singular,
not plural) -- the first argument to the various scanf functions.
Input characters are taken from a supplied stdio stream (which is
to say, a valid "FILE *" value) as needed. Input characters are
"consumed" in the process, which means that a future fgetc() on
the stream will no longer see them: they are gone forever. (But
one single "char" may be read and then put back, in the same manner
as ungetc(), if needed for internal purposes.) In the case of
sscanf() (and vsscanf() in C99), a temporary internal "string-stream"
is created with input coming from the string, and destroyed by the
time sscanf() (or vsscanf()) returns, and characters "consumed" by
the stream are still in the original string, so in this respect,
string-streams are much more forgiving.
Your format contains two %ld directives (and no white space, and no
other characters), as you note. But each "%ld" directive means
the same thing, which is:
Step 1: consume (and ignore) any white space on the stream, so
that the next available input character is non-white-space.
Step 2: consume (and save) as many decimal digits as possible,
with optional prefix sign, so that the next available input
character is not a decimal digit.
Step 3: convert the consumed digits to a "long" and store the
result via the supplied pointer (which must of course point to
a "long").
This directive will fail if there are no decimal digits available
after the whitespace is consumed. Steps 2 and 3 may also be combined
internally.
(The failure is an "input failure" if fgetc() would return EOF on
the stream, and a "matching failure" otherwise. C99 adds one more
failure case, which I do not fully understand and do not address
here. The difference between "input failure" and the others only
matters for the first conversion: input failure with no conversions
makes the scanf engine return EOF, while matching failure at that
point makes it return 0. If there have been successful conversions
and assignments, the engine returns the number of assignments.
In this case, if the engine returns 1 -- indicating the first "%ld"
worked, but the second failed -- you cannot distinguish immediately
between "input failure" and "matching failure" for the second
conversion. If this were not a "string-stream", you could use
(feof(stream) || ferror(stream)) to test whether there was an input
failure on the stream, if you really cared.)
Note that both steps 1 and 2 can require "putting back" a character
on the stream, because "consume as many characters as possible that
meet some test" is done as if by:
int c;
do {
c = fgetc(stream);
} while (c != EOF && whatever_test_applies_here(c));
if (c != EOF)
ungetc(c, stream);
(but usually "more efficiently", in some tricky way the implementation
has internally).
More to the point. How come when I enter 65, the computer won't spit
back 11?
If you follow the three steps described above, it should become clear
why.
Note that changing the directives will change the steps. If you
include a width specifier, step 2 in particular changes. "%1ld"
and "%2ld" would limit step 2 to consuming at most 1 or at most 2
characters (respectively), for instance, i.e., the directive is now
handled with code equivalent to:
int c;
int i, max;
char buf[SOME_SIZE];
do {
c = fgetc(stream);
} while (isspace(c)); /* note that isspace(EOF) is false */
if (c != EOF)
ungetc(c, stream);
#define IS_SIGN(c) ((c) == '+' || c == '-')
max = <the supplied format width>;
for (i = 0; i < max; i++) {
c = fgetc(stream);
/* note that isdigit(EOF) is false, so no separate test needed */
if (isdigit(c) || (i == 0 && IS_SIGN(c)))
buf
= c;
else
break;
}
if (c != EOF)
ungetc(c, stream);
if (i > 0 && isdigit(buf[i - 1])) {
buf = '\0';
*va_arg(ap, long *) = atol(buf); /* or strtol(buf, NULL, 10) */
} else
... handle input or matching failure, depending on c==EOF ...
Note: I cannot find the actual limit in the C99 draft standard text
I keep handy for searches, but I believe that "%ld" -- with no hard
limit on the number of decimal digits -- is allowed to "act like"
%4095ld or similar, so that the size of the "buf" into which decimal
digits are saved need not be infinite. (Of course, implementations
can simply combine the reading and converting:
unsigned long result = 0;
int sign = 0;
max = <whatever, possibly LLONG_MAX with max having type long long>;
for (i = 0; i < max; i++) {
c = fgetc(stream);
if (i == 0 && IS_SIGN(c)) {
if (c == '-')
sign = 1;
} else if (isdigit(c))
result = (result * 10) + (c - '0');
else
break;
}
... handle the rest similarly, but no need for atol/strtol ...
but this means that %ld input behaves differently on overflow than
does strtol(). This is allowed, but I prefer implementations that
handle overflow the same way in the scanf engine as in strtol().)
This is a lot to remember (which is why it is a good idea to keep
a reference handy, to look up all the details on how the scanf
engine has to work). But there are a couple of key items that
you should memorize, if you are going to use the scanf family:
- almost all conversions begin by skipping initial white space;
- "white space" includes newlines;
- almost all conversions DO NOT skip trailing white space.
This means that applying the scanf family to a stdio stream almost
always leaves "trailing white space" -- usually a newline -- behind
in the stream. This trailing white space will cause you trouble
later. It is tempting to add code to remove it, but this is usually
a mistake, because it is only *almost* always left behind, so if
you simply always remove another "line" ended by a newline, you
will sometimes remove input you should have left alone. The best
approach -- besides "avoid scanf entirely" -- tends to be "read
a line with a line-oriented function, then use sscanf() on the
resulting string". This gives you much more control, and much more
"obvious predictability" on how the program will behave with various
inputs. Code that is obvious and predictable tends to be easier
to debug, and hence more reliable and useful in the long run, than
code that is obscure.
(It is possible, but somewhat difficult, to "read a line" with
the scanf family. The code to do this is somewhat obscure. It
does appear here in comp.lang.c now and then.)