Reading a number from stdin

K

Keith Thompson

Malcolm McLean said:
Normally the valid numbers for the application are much lower than
INT_MAX, so the %5d method works for most situations.

When talking about something as generic as reading a number
from stdin, I don't know what "normally" is supposed to mean.
Some applications might only want numbers below 100; others might
need to handle anything in the range of int. If you're writing
a library routine to read integers, restricting the input to an
arbitrary range is not a good idea.

Furthermore, if you call scanf("%5d", &n) and enter "123456", n is
set to 12345 and the 6 is left in the input stream. You could add
code to check for that, but then you might as well use strtol().
I checked my version of sscanf, and it sets the target to 0xFFFFFFFFF
(all bits set) and returns 1 on begin fed a massive integer. That's
undesirable, because it should be returning 0 and setting the stream
to the start position. Sometimes you just want to skip a number and
parse the following text, in which case my compiler's behaviour is
best, but if the value is used, the program will get into an odd
state, even if checks are made.

I'm not at all sure what sscanf() *should* do for an out-of-range
numeric input. The trouble is that it just returns the number
of items matched; it has no good way to distinguish between
syntactically bad input and an out-of-range number.

If the behavior were to be defined in a future standard, it could
either treat it as a matching failure, or it could set the object to,
say, INT_MIN or INT_MAX and set error to ERANGE, like strtol() does.

Either would be better than leaving the behavior undefined.

(Incidentally, it's your runtime library, not your compiler, that
implements sscanf.)
 
M

Malcolm McLean

That's very risky advice. Try scanf("%d %d", &i, &j) and feed it
-10001. How surprised will the use be?
Oh of course. It also allows whitespace to match nothing.
Not good.
Also, %5d is not small enough to be safe with 16-bit ints. These may b
rare, but you never know when they will make a come-back.
True.

Really you've got to use strtol for robust parsing of integers.
 
M

Malcolm McLean

When talking about something as generic as reading a number
from stdin, I don't know what "normally" is supposed to mean.
Numbers usually mean something, typically integers are counts of something
in the real world. A few applications, like a calculator, might process numbers
without understanding their meaning. But normally it has to be programmed
in. The number might be the number of employees. That could conceivably
go above a hundred thousand, but only for the very largest organisations.
The number of characters in someone's name is never going to go that large,
nor are the number of optimisation levels for a compiler. You might add -04,
-O5, and so on, and it's hard to set a definite limit, but it's never going to go
super-high.
 
S

Stefan Ram

Malcolm McLean said:
Numbers usually mean something, typically integers are counts of something

When a program has to read something, the input has to conform
to certain expectations, otherwise the input is erroneous.

These expectations need to be laid down in an ILS (input-langauge
specification).

The core of the ILS is the specification of the syntax of the input
language (IL), usually using a grammar, using - for example - EBNF.

How errors in the input are to be treated is specified in the
requirement specifications (RS) for the software, which also
includes the ILS.

Given such an RS and money, a programmer then can write a
parser with error handling for that input language in C.

»Reading a number« or »reading numbers« cannot server as an RS,
because it is still too vague.
 
M

Malcolm McLean

When a program has to read something, the input has to conform
to certain expectations, otherwise the input is erroneous.

These expectations need to be laid down in an ILS (input-langauge
specification).

The core of the ILS is the specification of the syntax of the input
language (IL), usually using a grammar, using - for example - EBNF.

How errors in the input are to be treated is specified in the
requirement specifications (RS) for the software, which also
includes the ILS.

Given such an RS and money, a programmer then can write a
parser with error handling for that input language in C.

�Reading a number� or �reading numbers�cannot server as an RS,
because it is still too vague.
Systems like that soon hit reality.

The grammar might specify an integer as a +/- followed by a sequence of digits, with
zero being a special case of a leading zero allowed.
However C only allows easy representation of integers which will fit in a basic type.

You can of course code an arbitrary-precision integer representation to read the grammar,
only to find that it's referring to a user option that's unlikely to go above three.
That sort of thing adds massively to the costs of development and adds potential points of
failure. Also, people might ignore the specifications because they are so detached
from the actual requirements, leading to the worst possible situation - code which
doesn't in fact behave as documented.

Or you can write the grammar in terms of basic input functions you have.
 
K

Keith Thompson

Ben Bacarisse said:
That's very risky advice. Try scanf("%d %d", &i, &j) and feed it
-10001. How surprised will the use be?

I think you meant

scanf("%5d %5d", &i, &j);
 
S

Stefan Ram

Malcolm McLean said:
Systems like that soon hit reality.
The grammar might specify an integer as a +/- followed by a sequence of digits, with
zero being a special case of a leading zero allowed.

If such a grammar is »not realistic«, what do you then say about

decimal-constant:
nonzero-digit
decimal-constant digit

which is quoted straight from N1570 (6.4.4.1)?
 
K

Keith Thompson

Malcolm McLean said:
Numbers usually mean something, typically integers are counts of
something in the real world. A few applications, like a calculator,
might process numbers without understanding their meaning. But
normally it has to be programmed in. The number might be the number of
employees. That could conceivably go above a hundred thousand, but
only for the very largest organisations. The number of characters in
someone's name is never going to go that large, nor are the number of
optimisation levels for a compiler. You might add -04, -O5, and so on,
and it's hard to set a definite limit, but it's never going to go
super-high.

So *depending on the application's requirements*, it might make sense to
restrict the range of input values.

scanf(), or even fgets() followed by sscanf(), is not a useful or safe
way to do that, though it may be good enough for a quick-and-dirty
program where you aren't concerned about incorrect input.
 
M

Malcolm McLean

If such a grammar is not realistic, what do you then say about

decimal-constant:

nonzero-digit

decimal-constant digit

which is quoted straight from N1570 (6.4.4.1)?
It is possible to talk formal grammars without also talking gibberish.
But the psychological reality is that they engender a gibberish
mentality. "N1570 (6.4.4.1)?" is gibberish. It doesn't mean anything
to anyone who doesn't have that particular document in mind.

As the author of MiniBasic ( http://sourceforge.net/projects/minibasic/?source=directory )I'm not opposed to formal grammars, for formal
language specification. For an average text file format, however,
it's overkill.
It's not usually clear what a program should do when presented with
a huge integer, or a huge real, in a place where a number is
expected and allowed. It's not usually worth worrying too much about
it because the data almost always must be corrupt, numbers usually mean
something, and very high numbers are seldom valid. As long as the
program doesn't crash, and throws the file out, it's likely to be OK
in all but the most rigorous of environments.

My options parser uses a scanf-like interface to extract options, but
it calls strtol() internally, then throws out anything out of the
range of a signed int, at the parse level. Caller then throws out
anything out of range at the application level.
That's a reasonable, general-purpose solution to getting an integer
from the commandline.
 
S

Stefan Ram

Malcolm McLean said:
language specification. For an average text file format, however,
it's overkill.

It is exactly this kind of thinking that has lead us to the
current situation, where a CSV file that has been created by
a program A cannot be read by program B that is claiming to
be able to read CSV files.
 
M

Malcolm McLean

It is exactly this kind of thinking that has lead us to the
current situation, where a CSV file that has been created by
a program A cannot be read by program B that is claiming to
be able to read CSV files.
CSV should have been a bit more tightly specified.
I've got a parser on my website. It's quite a hunk of code, the header
has to be intelligently guessed, and nan is grief because not all
versions of C handle it the same.

Here it is
http://www.malcolmmclean.site11.com/www/


It's not terribly efficient. Unfortunately CSV files can be very
large and reading them can be a performance bottleneck. My version
reads the whole lot into memory with a separate allocation for
every string, which is only OK for small to medium-sized files on
medium to big machines.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,120
Messages
2,570,710
Members
47,282
Latest member
citowad9

Latest Threads

Top