Why does my simple while loop printf 2x?

K

Keith Thompson

BartC said:
Because we're doing line-oriented input.

You /can/ program at a character-level, but it would be like a syntax parser
looking at its input character-by-character instead of tokenising first,
then examining whole tokens.


You can at least consider the possibility of looking at more than one
character. Because it /is/ line-oriented, it means someone can enter Yoghurt
for yes, and Nigella for no, which is probably a bigger sin than
not checking for EOF on a supposedly interactive terminal.

Sure, you can consider the possibility of doing any kind of input you
like.

If the program's behavior depends only on the first character on a line,
there's no point in storing the entire line. If you need to consider
the entire line, then sure, you probably need to store the entire line
-- which means you have to allow (in *some* way) for arbitrarily long
lines.

You need to decide how you want the program to behave before you can
decide how to implement that behavior.
I've been using fixed-length buffers for line-oriented text-files for
decades. If it doesn't work because of a line-length problem, then the file
isn't what I would call line-oriented. (There's a simple test: can you edit
the file without (a) having to scroll a hundred screens horizontally to see
the end of any line; and (b) can you see the line without it wrapping over a
hundred lines vertically. If not, then it isn't a line-oriented text file as
is normally understood, and needs to be processed differently; as a
character-oriented one, for example.)

So 99 screens would be ok?
As for being able to read any conceivable input, that's an unnecessary
complication. Either the file is simply and practically line-oriented, or
it's not, so it's an error.

Fine -- and then you have to decide how to deal with such an error, and
write the code to handle it.
Something doesn't sound right about asking the user to type Y or N (perhaps
in answer to something critical such as formatting a hard drive), and then
being tolerant about lines thousands of characters long, that just happen to
start with a Y!

Define the requirements, then we can talk about how to implement them.
If you want to check the first character and reject very long lines, you
still don't need to store the entire line, just keep a count.
 
K

Keith Thompson

Ian Collins said:
There is, but it isn't a "standard" C facility. Look up the curses library.

I don't think curses is the answer. Curses generally takes over the
entire screen (think full-screen text editors). It's not useful for
single-line input that needs to do something like inhibiting echoing or
recognize characters without waiting for a newline. (Unless there's a
way to use curses that I'm not familiar with, which is always a
possibility).

It's true that there's no way to do that in portable C. If you
want to know how to do it using POSIX/Unix/Linux-specific features,
comp.unix.programmer would be a good place to ask (but check the
FAQ first; this isn't a new problem).
 
K

Keith Thompson

DFS said:
I wrote the declaration myself - it was simple.

int toupper(int c); as I recall.

No compiler warning, and the program ran fine. But as you say, an
#include would be better.

A rule of thumb: unless you're already sure that you know all you need
to know, never use *any* standard library function without first
checking its documentation (for Linux, that would be the man page).
"man toupper" includes the required "#include <ctype.h>" in the
synopsis.

[...]
 
D

DFS

I don't think curses is the answer. Curses generally takes over the
entire screen (think full-screen text editors). It's not useful for
single-line input that needs to do something like inhibiting echoing or
recognize characters without waiting for a newline. (Unless there's a
way to use curses that I'm not familiar with, which is always a
possibility).

It's true that there's no way to do that in portable C. If you
want to know how to do it using POSIX/Unix/Linux-specific features,
comp.unix.programmer would be a good place to ask (but check the
FAQ first; this isn't a new problem).


Of find the source for sudo
http://www.sudo.ws/sudo/dist/sudo-1.8.10p3.tar.gz

And find out how they suppress the screen display for the password -
from there maybe figure out how to restrict display to one letter?
 
B

BartC

Keith Thompson said:
Sure, you can consider the possibility of doing any kind of input you
like.

If the program's behavior depends only on the first character on a line,
there's no point in storing the entire line.

If I only needed to check the first few characters of a /file/, then I
might do just that, and not bother to read the entire file, nor even the
first line.

The context here however is primarily user input which can only be entered a
line at time; in other words, the system has already decided to use
line-buffering!

However there is another aspect: if you already have libraries that can
easily process console input and files by the line, iterate over a file by
line and so on, then you will probably decide to make use of some of what
you have; you wouldn't really go back to first principles and start worrying
about this character and that character, or whether some character is \n and
another is EOF.

You might do this sort of stuff to start with (to implement your libraries
for example), then quickly leave it behind.

(I go a couple of steps further and wrap a couple of language layers around
the C i/o functions; I don't think I even have a way, once a line of input
has been typed, to request it from the system a character at time, unless I
directly call the C runtime.)
So 99 screens would be ok?

No, one screen (extra wide if necessary) is best, if what you are editing or
viewing really is line-oriented and cannot be wrapped onto multiple lines
(where you would be dealing with paragraphs and should be using a
word-processor).

An upper limit of 250 characters (256 makes a nice round figure) is a good
guideline, although there will be a few exceptions (but those exceptions
don't stretch to having lines ten of millions of characters long; maybe a
few thousand, and that usually when the file is machine-generated).

Most text files I read and write have maximum line lengths below 100
characters.
 
B

BartC

Ben Bacarisse said:
It's not easy "keep everything in sync" with a fixed size line buffer
and a flexible one is significantly more code.

Suppose I have a file format from which I want to read 3 integers per line.
But the format is such that a trailing 0 value doesn't appear; but also a
line might have trailing information that should be ignored. For example:

100
200 300 400 123 comment
500 600

And I have a loop reading 3 numbers per line. For this input, I want to end
up with the data (100,0,0), (200,300,400) and (500,600,0).

My previous experience with C suggests that attempts to read anything beyond
the 100, for example, will skip the newline and read the 200 as the next
data. Trying to do this using a character stream is a nightmare. A
line-based approach as I suggested above is far easier.

I won't attempt to produce C code, but I use a scripting language which is
built on top of C's runtime, and the code would involve a loop containing a
line like this:

readln @f, a, b, c

If end-of-line is encountered, 0 is read; if there is extra stuff on the
line, that is skipped. Could it get any simpler?

A fixed-size buffer is used, yes, but I know that for this format, it is
ample (and if necessary I can increase the internal size from, say, 2KB to,
say, 1MB; I've still got 2999MB left.)
I don't really see the connection here. You may use any editing
interface you find comfortable, but I see no consequences for how one
might write prompt-reply code.

It's to illustrate my belief that many don't take line-oriented formats
seriously. If a file is line-oriented, then you don't treat such a file
(within an editor in my example) as a one contiguous sequence of characters,
even if that is the representation on disk. So repeated left-cursor presses
should stop at the beginning of a line, and not go up one line and then to
the end of that line before continuing!
 
B

Ben Bacarisse

BartC said:
Suppose I have a file format from which I want to read 3 integers per
line. But the format is such that a trailing 0 value doesn't appear;
but also a line might have trailing information that should be
ignored. For example:

100
200 300 400 123 comment
500 600

And I have a loop reading 3 numbers per line. For this input, I want to end
up with the data (100,0,0), (200,300,400) and (500,600,0).

My previous experience with C suggests that attempts to read anything beyond
the 100, for example, will skip the newline and read the 200 as the next
data. Trying to do this using a character stream is a nightmare. A
line-based approach as I suggested above is far easier.

Yes. I never said otherwise. I said the line-oriented input into a
fixed buffer does not solve all the problem -- you still have to "flush"
lines or take some special action -- otherwise invalid input (over-long
lines) can look like valid input and cause weird (and possibly
dangerous) unintended actions.

Please don't turn this into an all or nothing debate. Line-based input
is far easier for some things. Character-based input is easier for
others. In both cases, care must be taken that bad input does not have
unintended consequences.
I won't attempt to produce C code, but I use a scripting language which is
built on top of C's runtime, and the code would involve a loop containing a
line like this:

readln @f, a, b, c

If end-of-line is encountered, 0 is read; if there is extra stuff on the
line, that is skipped. Could it get any simpler?

Sigh. Who ever said it would not be? It is widely known that if line
endings are significant to the format, numerical input is better done
line-by line.
A fixed-size buffer is used, yes, but I know that for this format, it is
ample (and if necessary I can increase the internal size from, say, 2KB to,
say, 1MB; I've still got 2999MB left.)


It's to illustrate my belief that many don't take line-oriented formats
seriously. If a file is line-oriented, then you don't treat such a file
(within an editor in my example) as a one contiguous sequence of characters,
even if that is the representation on disk. So repeated left-cursor
presses should stop at the beginning of a line, and not go up one line
and then to the end of that line before continuing!

Eh? How you like to edit files says something about what people here
take seriously? That belief (whetever it means) should be illustrated
by remarks people make here. The peculiar nature of your editor's
action on line endings tell me about your view of them, not about anyone
else's.
 
J

James Kuyper

Because we're doing line-oriented input.

That's what the test for '\n' is all about.
You /can/ program at a character-level, but it would be like a syntax parser
looking at its input character-by-character instead of tokenising first,
then examining whole tokens.

Right. What's so wrong about that? Some times character-by-character is
the simplest way to do something, and that certainly seems to be the
case for this job.
You can at least consider the possibility of looking at more than one
character. Because it /is/ line-oriented, it means someone can enter Yoghurt
for yes, and Nigella for no, which is probably a bigger sin than
not checking for EOF on a supposedly interactive terminal.

That can be handled, if he wishes, by simply mandating that the
immediately following character be a '\n'. It's still not particularly
complicated logic.
....
As for being able to read any conceivable input, that's an unnecessary
complication. Either the file is simply and practically line-oriented, or
it's not, so it's an error.

Just because such input would be an error doesn't justify dismissing the
issue - what your code does with erroneous input should always be a
conscious decision, not just a matter of ignoring the possibility.
You're perfectly free to decide "The chance of erroneous input is so low
that I don't care whether it might result in formatting my hard disk." -
but you need to actually make that decision; and you need to own that
decision - no complaining about it being anyone else's fault if, as a
result, you do end up with a formatted hard disk.

If you decide to actually handle that kind of error, rather than
ignoring the possibility of it coming up, doing so in a appropriate
fashion gets a bit complicated when using fgets(). It's not very
complicated, but it's sufficiently complicated to make the
character-at-a-time approach a reasonable alternative.
 
B

BartC

Ben Bacarisse said:
Eh? How you like to edit files says something about what people here
take seriously? That belief (whetever it means) should be illustrated
by remarks people make here. The peculiar nature of your editor's
action on line endings tell me about your view of them, not about anyone
else's.

It's not my editor which is peculiar, it's everyone else's! And I can infer
other people's attitudes to this to some extent by how their editors work.
I've just retried a handful of editors, and none of them seem to respect
line boundaries. Two of them seemed to be program editors (Notepad++ and
SciTE) and they were to set Python (which is strongly line-oriented) yet
they still happily go right through line endings. In fact I don't think I've
ever seen an editor that does otherwise, apart from mine.
 
L

Les Cargill

BartC said:
Ben Bacarisse said:
BartC said:
[...]
If line-oriented input is to be used (and that fits in better if
the rest
of
the application is command-line based), then the steps are these:

* Read an entire line into a buffer.

Why?

Because we're doing line-oriented input.

You /can/ program at a character-level, but it would be like a syntax
parser
looking at its input character-by-character instead of tokenising first,
then examining whole tokens.

That's not good advice in this context. The fact that newlines are
significant is not enough on it's own to switch to a line buffer.

Line buffers can make life easier. (Read the whole line, then you can
read individual items within the buffer and stop at the end, without
inadvertently reading part of the next line and getting everything out
of sync.) If you are creating a command-line-type interface, you want to
be able to work with the whole line, and not via a one-character window.

So why make a decision to work with characters, just because of some
rare situation where an input line is many thousands of characters long?
(Actually, using that logic, there would be /no/ situation that a line
buffer could ever be used.)

Constructing your own line (or packet) buffer handler is a relatively
inexpensive way to preclude the possibility of memory overwrites.It
also provides you the option of detecting and doing
something about missed characters.


As you may know, the hearttbleed bug has been in the news lately. So...

At some point in your evolution as a propgrammer, you'll probably want
to "do" asynchronous I/O with explicit unmarshaling of the data stream.
This may mean using select()/pselect(). This may mean
state machines or callbacks ( or both ).

What you do with error cases is going to be very sensitive to
context.

I really do recommend learning a minimal subset of Tcl
from the Brent Welch book to learn about asynchronous I/O.
It gives you the color and the shape of how to "do"
asynchronous I/O without resorting to threads. You can then
map that conceptually onto programs written in the 'C' language.

This uses the "fileevent" verb in Tcl. The reason to go with Tcl
is that it's less text than a comparable 'C' program. For all I know,
the other scripting languages offer the same but I haven't gone there yet.

If you have a Tcl script that works, then you can compare/contrast
that with your 'C' implementation.
 
B

BartC

James Kuyper said:
On 05/17/2014 06:25 PM, BartC wrote:

That can be handled, if he wishes, by simply mandating that the
immediately following character be a '\n'. It's still not particularly
complicated logic.

According to this post from Keith, there might not be a '\n' character
following:

Keith Thompson said:
That's an infinite loop if you get an end-of-file condition on
standard input before seeing a '\n' (on Unix-like systems, that
can happen if the user types Ctrl-D twice, or if you're reading
from a text file that doesn't end with a '\n' character).

(My line buffers have any line-terminator stripped; the string always ends
only with a 0-terminator).

Even if '\n' was guaranteed, it won't be long before you will want to accept
" Y" or "Y " or "yeS", when fiddling with individual characters will be a
lot of work.
 
B

BartC

Les Cargill said:
BartC wrote:
Constructing your own line (or packet) buffer handler is a relatively
inexpensive way to preclude the possibility of memory overwrites.It
also provides you the option of detecting and doing
something about missed characters.
As you may know, the hearttbleed bug has been in the news lately. So...

At some point in your evolution as a propgrammer, you'll probably want to
"do" asynchronous I/O with explicit unmarshaling of the data stream. This
may mean using select()/pselect(). This may mean
state machines or callbacks ( or both ).

What you do with error cases is going to be very sensitive to
context.

I don't understand what any of that means. Do I need to?

My 'evolution' as a programmer tends to involve shaping things to work the
way I want rather than the other way around.

And the topic is processing a Y or N input at a program prompt. How
difficult can it be?
 
K

Keith Thompson

BartC said:
According to this post from Keith, there might not be a '\n' character
following:

Right -- so that would be an input error, and one that's easily
detected by a program that reads a character at a time.

If the requirement for the program is to pay attention only to the
first character on a line, and discard the rest, then you *don't*
need to consider the possibility of looking at more than one
character *until the requirements change*.

You can write code that can be easily modified for changing
requirements, but that can only go so far. If you carefully write
your input code so it's easy to modify to look at more than the first
character on a line, and the next requirement update asks you to use
ncurses and read a single character without waiting for a newline,
then you've wasted some time and effort.

[...]
Even if '\n' was guaranteed, it won't be long before you will want to
accept " Y" or "Y " or "yeS", when fiddling with individual characters
will be a lot of work.

So change the code when the requirements change.

Given a requirement to examine the first character on an input line and
discard the rest, you advocate reading the entire line into a buffer.
How does that approach handle an input line longer than the buffer?
Making the buffer bigger is not an answer, unless you're using
realloc() to expand it at run time.

If you need to consider the entire line, then of course it can make
sense to read the entire line into memory. Nobody claimed otherwise.
 
K

Keith Thompson

BartC said:
Suppose I have a file format from which I want to read 3 integers per line.
But the format is such that a trailing 0 value doesn't appear; but also a
line might have trailing information that should be ignored. For example:

100
200 300 400 123 comment
500 600

And I have a loop reading 3 numbers per line. For this input, I want to end
up with the data (100,0,0), (200,300,400) and (500,600,0).

My previous experience with C suggests that attempts to read anything beyond
the 100, for example, will skip the newline and read the 200 as the next
data. Trying to do this using a character stream is a nightmare. A
line-based approach as I suggested above is far easier.

If you use an input function that treats newlines and other
whitespace identically, then of course that's what it's going to do.
Unfortunately, if you use scanf() to read numeric data, then (a)
it will skip all whitespace, including newlines, and (b) it has
undefined behavior if the input is syntactically valid but outside
the range of the relevant type.

For example, this:

scanf("%d%d", &x, &y);

given this input:

123
99999999999999999999999999999999999999999

will store 123 in x, then quietly skip the newline, then exhibit
undefined behavior reading the next line.

(I do not defend either behavior, and I consider the latter to be a flaw
in the language definition.)

So don't use scanf for that.

[...]
 
B

BartC

Keith Thompson said:
Right -- so that would be an input error, and one that's easily
detected by a program that reads a character at a time.

An input error where? You're now mandating that an input file *must* end
with a newline otherwise Y<newline> or N<newline> will never occur. With
line input, these untidy details can be taken care of in one central
location, which can always deliver a line buffer with a well-defined ending
(0 in my case, perhaps always \n in others), instead of having to worry
about it in a dozen places.
Given a requirement to examine the first character on an input line and
discard the rest, you advocate reading the entire line into a buffer.
How does that approach handle an input line longer than the buffer?

If you're going to worry about that all the time, then you will never get
anywhere.

But you could do everything just right, and the OP's code could still be
given input that consists of hundred billion lines all containing "?\n"
(until perhaps the last one which might have Y or N). It'll work, but might
take a few years to complete. (And I'm still uncertain about being tolerant
about all those "?" lines by allowing another chance to get it right, in a
situation where that is clearly inappropriate because there is no human to
take heed of the error message.)
 
I

Ian Collins

Ben said:
Agreed. I did not say otherwise.


It's not easy "keep everything in sync" with a fixed size line buffer
and a flexible one is significantly more code.

Which only has to be written once.

I really can't see what all the fuss is about. Once you have the code
to manage line based input, why not use it? I write a lot of parsers
for different text processing applications and unless there's a
compelling reason to do otherwise, I always use a while not end of file
get line, process line loop.
 
G

Geoff

I agree! My next attempt is to limit the input to 1 character. Y or N
(or y or n). Anything else they type won't show onscreen, and they
won't be prompted again. The program will just sit there, waiting for
the right keystroke.

Such a thing is surely possible with C. When you login at a Linux
command prompt or enter your password for sudo, the password line is
'disabled' and doesn't echo anything back. I bet there's a way to
restrict it to showing only Y/N.

It's quite easy in C++, so at the risk of invoking the ire of the
regulars on this list I post what has been my solution for years now:

//
//============================================================================
// Get user's yes/no answer from console input.
// Argument: "y" or "n" indicating the default answer expected.
// Blank reply means accept default answer.
//============================================================================
//
bool GetYesNo (std::string yn)
{
std::string str;

getline(std::cin, str);
std::transform(str.begin(), str.end(), str.begin(), ::tolower);
std::transform( yn.begin(), yn.end(), yn.begin(), ::tolower);

// if expecting y and get blank or y, return true
if ((yn.compare(0, 1, "y") == 0 && str.compare(0, 1, "") == 0) ||
str.compare(0, 1, "y") == 0)
return true;

// if expecting n and get n or blank, return false
if ((yn.compare(0, 1, "n") == 0 && str.compare(0, 1, "") == 0) ||
str.compare(0, 1, "n") == 0)
return false;
else
return false;
}


This function receives an argument string, either "y" or "n" depending
on what you expect to be default and returns accordingly.

Calling convention:

bool DiskFlag;

std::cout << "Do you want output to disk(Y/N) <Y>? ";
DiskFlag = GetYesNo("y"); // default is yes


The C version of GetYesNo is left as an exercise for the group.
 
B

Ben Bacarisse

Ian Collins said:
Which only has to be written once.

I really can't see what all the fuss is about. Once you have the code
to manage line based input, why not use it?

Yes, if you have it (and it is immune to any of the potential problems)
use it, but that argument never illuminates anything about how you
should go about it in the first place. This is, after, a discussion of
beginner's choices.
I write a lot of parsers
for different text processing applications and unless there's a
compelling reason to do otherwise, I always use a while not end of
file get line, process line loop.

Fine. I do it the other way unless there is a compelling reason not to,
but that is because pretty much all the parsers I've written want to
ignore newlines -- they operate on token streams. Reading lines would
just add an irrelevant detail.
 
B

Ben Bacarisse

BartC said:
It's not my editor which is peculiar, it's everyone else's! And I can
infer other people's attitudes to this to some extent by how their
editors work. I've just retried a handful of editors, and none of them
seem to respect line boundaries. Two of them seemed to be program
editors (Notepad++ and SciTE) and they were to set Python (which is
strongly line-oriented) yet they still happily go right through line
endings. In fact I don't think I've ever seen an editor that does
otherwise, apart from mine.

And what do you conclude from that, and why?
 
B

Ben Bacarisse

Geoff said:
It's quite easy in C++, so at the risk of invoking the ire of the
regulars on this list I post what has been my solution for years now:

//
//============================================================================
// Get user's yes/no answer from console input.
// Argument: "y" or "n" indicating the default answer expected.
// Blank reply means accept default answer.
//============================================================================
//
bool GetYesNo (std::string yn)
{
std::string str;

getline(std::cin, str);

The trouble with this (at least in the old days) was that it presents an
opportunity for someone to "jam up" the system. A program known to use
getline could be made to consume unlimited resources. Is this not
considered an issue anymore?

(I just tried and it was not pretty! -- mainly because standard Linux
installs are not good a defending again memory hogs).
std::transform(str.begin(), str.end(), str.begin(), ::tolower);
std::transform( yn.begin(), yn.end(), yn.begin(), ::tolower);

// if expecting y and get blank or y, return true
if ((yn.compare(0, 1, "y") == 0 && str.compare(0, 1, "") == 0) ||
str.compare(0, 1, "y") == 0)
return true;

// if expecting n and get n or blank, return false
if ((yn.compare(0, 1, "n") == 0 && str.compare(0, 1, "") == 0) ||
str.compare(0, 1, "n") == 0)
return false;
else
return false;
}

That looks like a lot of work. I'd use peek and ignore:

bool getYesNo(char deflt)
{
int c = std::cin.peek();
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
return tolower(c == EOF || c == '\n' ? deflt : c) == 'y';
}
The C version of GetYesNo is left as an exercise for the group.

Does is have to buffer an entire unbounded line, or can we do it the
easy way? :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,816
Latest member
nipsseyhussle

Latest Threads

Top