sscanf question

C

Chad

Given the following

#include <stdio.h>

int main(void)
{
char line[BUFSIZ];
long arg1, arg2;

while(fgets(line, BUFSIZ, stdin) != NULL){
if(sscanf(line,"%ld%ld",&arg1,&arg2) == 2){
snprintf(line, sizeof(line), "%ld\n", arg1 + arg2);
if(fputs(line, stdout) == EOF){
fprintf(stderr, "output error\n");
}
}
else{
fprintf(stderr, "invalid input\n");
}
}

return 0;
}

When I run it, I get the following
[cdalten@localhost oakland]$ ./add
4 5
9
65
invalid input

Now, here's the question. How come there has to be a space between the
numbers? sscanf() in this case doesn't even have a space in the format
args. Ie, it is like the following

(sscanf(line,"%ld%ld",&arg1,&arg2)

More to the point. How come when I enter 65, the computer won't spit
back 11?
 
C

Chad

Chad said:
Given the following
#include <stdio.h>
int main(void)
{
char line[BUFSIZ];
long arg1, arg2;
while(fgets(line, BUFSIZ, stdin) != NULL){
if(sscanf(line,"%ld%ld",&arg1,&arg2) == 2){

When I run it, I get the following
[cdalten@localhost oakland]$ ./add
4 5
9
65
invalid input
Now, here's the question. How come there has to be a space between the
numbers?

If there didn't have to be, how would you add sixty-five to twenty-seven?

No idea. For whatever reasons, I thought I was maybe missing
something. It does happen from time to time.
 
S

santosh

Chad said:
Given the following

#include <stdio.h>

int main(void)
{
char line[BUFSIZ];
long arg1, arg2;

while(fgets(line, BUFSIZ, stdin) != NULL){
if(sscanf(line,"%ld%ld",&arg1,&arg2) == 2){
snprintf(line, sizeof(line), "%ld\n", arg1 + arg2);
if(fputs(line, stdout) == EOF){
fprintf(stderr, "output error\n");
}
}
else{
fprintf(stderr, "invalid input\n");
}
}

return 0;
}

When I run it, I get the following
[cdalten@localhost oakland]$ ./add
4 5
9
65
invalid input

Now, here's the question. How come there has to be a space between the
numbers?

How will you differentiate two numbers otherwise? Consider the string

3456

How many numbers does this represent? One, two, three or four?
sscanf() in this case doesn't even have a space in the format
args. Ie, it is like the following

(sscanf(line,"%ld%ld",&arg1,&arg2)

More to the point. How come when I enter 65, the computer won't spit
back 11?

The standard scanf field separator is whitespace as defined by the
isspace function.
 
B

Bartc

santosh said:
Chad wrote:

How will you differentiate two numbers otherwise? Consider the string

3456

How many numbers does this represent? One, two, three or four?

I make it ten different numbers, although not all at the same time.

It is useful sometimes to pack numbers like this, where the width of each
number is fixed. Perhaps using something like %2d%2d for the scanf format.
 
B

Bill Reid

Richard Heathfield said:
Chad said:
Given the following

#include <stdio.h>

int main(void)
{
char line[BUFSIZ];
long arg1, arg2;

while(fgets(line, BUFSIZ, stdin) != NULL){
if(sscanf(line,"%ld%ld",&arg1,&arg2) == 2){

When I run it, I get the following
[cdalten@localhost oakland]$ ./add
4 5
9
65
invalid input

Now, here's the question. How come there has to be a space between the
numbers?

If there didn't have to be, how would you add sixty-five to twenty-seven?

Boy, the trolls are having a field day today, with "troll zero" doing
his usual bang-up job of trollery...

Gee, how can you add 65 to 27, given a string of 6527, I don't know,
maybe like THIS:

sscanf(line,"%02d%02d",&arg1,&arg2);

arg3=arg1+arg2;

Though this MAY not be exactly what OP is looking for or really
trying to do...but in any event, there is a max field width specifier
in the *scanf() functions, so you can scan in numbers of a certain
number of digits even though there is no space between them, as
was pointed out by others...

To illustrate a few examples (for the benefit of "troll zero", in case
he ever wants to move beyond Usenet trollery and actually write
some useful code):

/* [0-pad year][0-pad month][0-pad day] "20020308" */
case df_YZMZDZ :
sscanf(string,"%04d%02d%02d",
&date_components.year,
&date_components.month,
&date_components.day);
break;

Or a very typical set of defines (or enums) for configuration or whatever
with levels of categories:

#define category_0 0
#define category_0_config_0 1
....
#define category_1 100
#define category_1_config_1 101
....

Note that we can now populate our configuration variables (or whatever)
from a "single" number (such a 206234317) in a configuration file (or
whatever)
by scanning it using the field width specifier as above...
 
C

Chris Torek

Given the following

#include <stdio.h>

int main(void)
{
char line[BUFSIZ];
long arg1, arg2;

while(fgets(line, BUFSIZ, stdin) != NULL){
if(sscanf(line,"%ld%ld",&arg1,&arg2) == 2){
snprintf(line, sizeof(line), "%ld\n", arg1 + arg2);
if(fputs(line, stdout) == EOF){
fprintf(stderr, "output error\n");
}

You could just printf() the result, instead of snprintf()-ing and
then fputs()-ing. But this code is all correct, at least.
}
else{
fprintf(stderr, "invalid input\n");
}
}

return 0;
}
When I run it, I get the following
[cdalten@localhost oakland]$ ./add
4 5
9
65
invalid input

Now, here's the question. How come there has to be a space between the
numbers? sscanf() in this case doesn't even have a space in the format
args.

Most people seem to misunderstand how the scanf engine works.

The scanf family of functions all use a common "engine" to do
their work. This "scanf engine" is fairly simple -- one might
even say "simplistic" -- and simply executes "directives" in a
sequence, one after another, until one of them "fails" or the
engine runs out of directives, whichever occurs first.

Directives consist of literal text, white space, or "conversions"
introduced by "%". They are given by the "format arg" (singular,
not plural) -- the first argument to the various scanf functions.

Input characters are taken from a supplied stdio stream (which is
to say, a valid "FILE *" value) as needed. Input characters are
"consumed" in the process, which means that a future fgetc() on
the stream will no longer see them: they are gone forever. (But
one single "char" may be read and then put back, in the same manner
as ungetc(), if needed for internal purposes.) In the case of
sscanf() (and vsscanf() in C99), a temporary internal "string-stream"
is created with input coming from the string, and destroyed by the
time sscanf() (or vsscanf()) returns, and characters "consumed" by
the stream are still in the original string, so in this respect,
string-streams are much more forgiving.

Your format contains two %ld directives (and no white space, and no
other characters), as you note. But each "%ld" directive means
the same thing, which is:

Step 1: consume (and ignore) any white space on the stream, so
that the next available input character is non-white-space.

Step 2: consume (and save) as many decimal digits as possible,
with optional prefix sign, so that the next available input
character is not a decimal digit.

Step 3: convert the consumed digits to a "long" and store the
result via the supplied pointer (which must of course point to
a "long").

This directive will fail if there are no decimal digits available
after the whitespace is consumed. Steps 2 and 3 may also be combined
internally.

(The failure is an "input failure" if fgetc() would return EOF on
the stream, and a "matching failure" otherwise. C99 adds one more
failure case, which I do not fully understand and do not address
here. The difference between "input failure" and the others only
matters for the first conversion: input failure with no conversions
makes the scanf engine return EOF, while matching failure at that
point makes it return 0. If there have been successful conversions
and assignments, the engine returns the number of assignments.

In this case, if the engine returns 1 -- indicating the first "%ld"
worked, but the second failed -- you cannot distinguish immediately
between "input failure" and "matching failure" for the second
conversion. If this were not a "string-stream", you could use
(feof(stream) || ferror(stream)) to test whether there was an input
failure on the stream, if you really cared.)

Note that both steps 1 and 2 can require "putting back" a character
on the stream, because "consume as many characters as possible that
meet some test" is done as if by:

int c;

do {
c = fgetc(stream);
} while (c != EOF && whatever_test_applies_here(c));
if (c != EOF)
ungetc(c, stream);

(but usually "more efficiently", in some tricky way the implementation
has internally).
More to the point. How come when I enter 65, the computer won't spit
back 11?

If you follow the three steps described above, it should become clear
why.

Note that changing the directives will change the steps. If you
include a width specifier, step 2 in particular changes. "%1ld"
and "%2ld" would limit step 2 to consuming at most 1 or at most 2
characters (respectively), for instance, i.e., the directive is now
handled with code equivalent to:

int c;
int i, max;
char buf[SOME_SIZE];

do {
c = fgetc(stream);
} while (isspace(c)); /* note that isspace(EOF) is false */
if (c != EOF)
ungetc(c, stream);

#define IS_SIGN(c) ((c) == '+' || c == '-')

max = <the supplied format width>;
for (i = 0; i < max; i++) {
c = fgetc(stream);
/* note that isdigit(EOF) is false, so no separate test needed */
if (isdigit(c) || (i == 0 && IS_SIGN(c)))
buf = c;
else
break;
}
if (c != EOF)
ungetc(c, stream);

if (i > 0 && isdigit(buf[i - 1])) {
buf = '\0';
*va_arg(ap, long *) = atol(buf); /* or strtol(buf, NULL, 10) */
} else
... handle input or matching failure, depending on c==EOF ...

Note: I cannot find the actual limit in the C99 draft standard text
I keep handy for searches, but I believe that "%ld" -- with no hard
limit on the number of decimal digits -- is allowed to "act like"
%4095ld or similar, so that the size of the "buf" into which decimal
digits are saved need not be infinite. (Of course, implementations
can simply combine the reading and converting:

unsigned long result = 0;
int sign = 0;

max = <whatever, possibly LLONG_MAX with max having type long long>;
for (i = 0; i < max; i++) {
c = fgetc(stream);
if (i == 0 && IS_SIGN(c)) {
if (c == '-')
sign = 1;
} else if (isdigit(c))
result = (result * 10) + (c - '0');
else
break;
}
... handle the rest similarly, but no need for atol/strtol ...

but this means that %ld input behaves differently on overflow than
does strtol(). This is allowed, but I prefer implementations that
handle overflow the same way in the scanf engine as in strtol().)

This is a lot to remember (which is why it is a good idea to keep
a reference handy, to look up all the details on how the scanf
engine has to work). But there are a couple of key items that
you should memorize, if you are going to use the scanf family:

- almost all conversions begin by skipping initial white space;
- "white space" includes newlines;
- almost all conversions DO NOT skip trailing white space.

This means that applying the scanf family to a stdio stream almost
always leaves "trailing white space" -- usually a newline -- behind
in the stream. This trailing white space will cause you trouble
later. It is tempting to add code to remove it, but this is usually
a mistake, because it is only *almost* always left behind, so if
you simply always remove another "line" ended by a newline, you
will sometimes remove input you should have left alone. The best
approach -- besides "avoid scanf entirely" :) -- tends to be "read
a line with a line-oriented function, then use sscanf() on the
resulting string". This gives you much more control, and much more
"obvious predictability" on how the program will behave with various
inputs. Code that is obvious and predictable tends to be easier
to debug, and hence more reliable and useful in the long run, than
code that is obscure.

(It is possible, but somewhat difficult, to "read a line" with
the scanf family. The code to do this is somewhat obscure. It
does appear here in comp.lang.c now and then.)
 
C

CBFalconer

.... snip good explanation of scanf ...
(It is possible, but somewhat difficult, to "read a line" with
the scanf family. The code to do this is somewhat obscure. It
does appear here in comp.lang.c now and then.)

Er - Dan Pop hasn't posted here for at least 2 years :)
 
V

vippstar

(It is possible, but somewhat difficult, to "read a line" with
the scanf family. The code to do this is somewhat obscure. It
does appear here in comp.lang.c now and then.)

I'd say it's impossible in robust code; the stream can have unwanted
embedded null bytes, which scanf will happily read.
 
C

CBFalconer

I'd say it's impossible in robust code; the stream can have
unwanted embedded null bytes, which scanf will happily read.

So? A null byte is not a digit, nor a period, so it will normally
be treated as marking the end of a numeric field.
 
B

Ben Bacarisse

I'd say it's impossible in robust code; the stream can have unwanted
embedded null bytes, which scanf will happily read.

That is not a problem. Given:

char line[101], nl[2];
int nchars;

The call:

scanf("%100[^\n]%n%[\n]", line, &nchars, nl)

tells us all we need to know. If the return is 2 we saw a whole
line. If the return is 1 it is partial. In both cases, nchars is the
number of characters read (excluding a newline if present) and will
happily include nulls in this count.
 
B

Ben Bacarisse

Ben Bacarisse said:
I'd say it's impossible in robust code; the stream can have unwanted
embedded null bytes, which scanf will happily read.

That is not a problem. Given:

char line[101], nl[2];
int nchars;

The call:

scanf("%100[^\n]%n%[\n]", line, &nchars, nl)

I missed the 1 in the %1[\n] format, sorry. Anyway, you get the idea...
 
V

vippstar

Ben Bacarisse said:
(e-mail address removed) writes:
That is not a problem. Given:
char line[101], nl[2];
int nchars;
The call:
scanf("%100[^\n]%n%[\n]", line, &nchars, nl)

I missed the 1 in the %1[\n] format, sorry. Anyway, you get the idea...
tells us all we need to know. If the return is 2 we saw a whole
line. If the return is 1 it is partial. In both cases, nchars is the
number of characters read (excluding a newline if present) and will
happily include nulls in this count.

nchar is indeed the number of characters/bytes read. There needs to be
another check like

if(strlen(line) != nchar) /* embedded lunn bytes */

As you see, `line' is processed twice, which may be unwanted in
'robust' code.
 
B

Barry Schwarz

What are you talking about?

Well, you said '\0' characters would pose some difficulty. He said
they wouldn't.

So let's start at the beginning. Why do you think they prevent robust
code from using scanf? Before you answer, I suggest you look through
the archives for posts by Dan Pop that describe exactly how to do
this.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top