Any way to take a word as input from stdin ?

A

arnuld

I searched the c.l.c archives provided by Google as Google Groups with
"word input" as the key words and did not come up with anything good.


C++ has std::string for taking a word as input from stdin. C takes input
in 2 ways:

1) as a character, etchar()
2) as a whole line, fgets()


as C programmer, are we supposed to create a get_word function everytime
when we need a words as input from stdin ( e.g. terminal)
 
V

vippstar

I searched the c.l.c archives provided by Google as Google Groups with
"word input" as the key words and did not come up with anything good.

C++ has std::string for taking a word as input from stdin. C takes input
in 2 ways:

1) as a character, etchar()
2) as a whole line, fgets()

as C programmer, are we supposed to create a get_word function everytime
when we need a words as input from stdin ( e.g. terminal)

char word[64];
scanf("%63s", word);

Alternatively, write a get_line function (or use one written by pete,
richard heathfield, eric sossman, cbfalconer et cetera) and then split
that into words.
 
R

Richard Bos

arnuld said:
C++ has std::string for taking a word as input from stdin. C takes input
in 2 ways:

1) as a character, etchar()
2) as a whole line, fgets()

as C programmer, are we supposed to create a get_word function everytime
when we need a words as input from stdin ( e.g. terminal)

There is no generic solution (mainly because there is no consensus on
what a "word" is), so yes.

Richard
 
C

CBFalconer

arnuld said:
I searched the c.l.c archives provided by Google as Google Groups
with "word input" as the key words and did not come up with
anything good.

C++ has std::string for taking a word as input from stdin. C takes
input in 2 ways:

1) as a character, etchar()
2) as a whole line, fgets()

as C programmer, are we supposed to create a get_word function
everytime when we need a words as input from stdin ( e.g. terminal)

Well, first you have to define a word. Does it terminate on
blanks, on blanks and non-print chars, on blanks and tabs, etc. I
think you will find that the C++ mechanism terminates on blanks and
'\n' (but I could well be wrong). Having defined it, you just
write the code to extract such a beast from a stream (or from a
string). At that point both you and your code reader know exactly
what the function extracts.

Don't forget to preserve the exit char. Something else may need
it.

Note that, having written the function, you are allowed to keep its
source (and its object code) and reuse it as often as you wish,
with minimum effort. If you have taken the elementary precaution
of writing it in standard C, you can use it anywhere.
 
A

arnuld

The first step is to define what a "word" is.

Fore *my* program, a word is a collection of letters, numbers or anything
separated by space, tab or newline.

How many words are these:

1. don't

1 word

2. antidisestablish-mentarianism

1 word

3. Joe,Bob,Sally, and Henry.

3 words. Joe,Bob,Sally, makes one word, and makes second, Henry. makes
3rd ( notice that full stop with Henry.)

4. Joe, Bob, Sally, and Henry.

5 words

5. $1,416,383,583.20

all 1 word. There is no space in between them.

6. ()@#$#^&*#^%%#^@*^$&*$

1 word

7. George W. Bush

3 words


8. slam-dunk

1 word

9. 15th-century vase

2 words

10. M.O.N.S.T.E.R., the computer chess-playing machine

5 words


11. (e-mail address removed)

1 word, of course


Justify your answers.


Any collection of letters,symbols or numbers separated by single or
multiple spaces or tab or newline. Therefore

comp.lang.c++ --> 1 word
Std. Lib --> 2 words
Lov@389&om --> 1 word


I think it is pretty much clear now what a word is.
 
B

Bartc

arnuld said:
Fore *my* program, a word is a collection of letters, numbers or anything
separated by space, tab or newline.



1 word



1 word



3 words. Joe,Bob,Sally, makes one word, and makes second, Henry. makes
3rd ( notice that full stop with Henry.)

You have commas in the middle of words?

Ever heard of comma-delimited files? Comma is way up there with space and
tab.
 
A

arnuld

You have commas in the middle of words?

Ever heard of comma-delimited files? Comma is way up there with space and
tab.


yes, I know and @%$@programmimnng34 is not a word either. If I start to
differentiate these things then it will become very complex to define what
a word is and there could be lots of controversy over what should be (or
could be ?) a word. So I take a simple approach, the white space
(whether a newline or a tab or a single space) separates the words. simple ...
 
A

arnuld

Well, first you have to define a word. Does it terminate on
blanks, on blanks and non-print chars, on blanks and tabs, etc. I
think you will find that the C++ mechanism terminates on blanks and
'\n' (but I could well be wrong).

I have told this already in my last reply ( to BartC )


Having defined it, you just
write the code to extract such a beast from a stream (or from a
string). At that point both you and your code reader know exactly
what the function extracts.

Now there is a big problem in this. In C++ i don't have to care whether
users enter one word or 100s. Memory was being managed by std. lib.
vector. Now here I am thinking of using fgets() to store the input,
which has 2 problems:

1) extract words from each line.
2) fgets() uses array top store data and I don't know how large is
the input, so I can't decide on the size of the array.



Don't forget to preserve the exit char.
Something else may need it.

you mean null character ?

Note that, having written the function, you are allowed to keep its
source (and its object code) and reuse it as often as you wish, with
minimum effort. If you have taken the elementary precaution of writing
it in standard C, you can use it anywhere.

Thats what I want to do, write in ANSI C :)
 
A

arnuld

This is a common problem - so common, in fact, that I wrote it up on the
Web. Take a look at http://www.cpax.org.uk/prg/writings/fgetdata.php which
looks at scanf, gets, and fgets, points out the difficulties with each,
and then discusses a possible solution to the problem of arbitrarily long
lines.

...SNIP....


I have not checked it but will be doing it later. The only one question
that keeps on popping up into my mind is "Why C was not designed to have
this feature ? ". That reminds of an article "Back to Basics" by Joel
Spolsky where he said that we have null terminated strings in C whihc
are much slower than PASCAL strings not by choice but by force, as C was
developed on PDP-7, which had ASCIZ table, which required strings to be Z
terminated ( Z means ZERO). Do we have same kid of thing here in my
problem ?

I am just curious and feel a little strange on having this "word problem"
in C.
 
A

arnuld

Your original question was:
"as C programmer,
are we supposed to create a get_word function everytime
when we need a words as input from stdin"
The answer is
"Yes; every time that you define what you want 'word' to mean."

yes, I think CBFalconer also answered that and now things are getting much
more fundamental as I am starting to writing code
 
C

CBFalconer

arnuld said:
yes, I know and @%$@programmimnng34 is not a word either. If I
start to differentiate these things then it will become very
complex to define what a word is and there could be lots of
controversy over what should be (or could be ?) a word. So I
take a simple approach, the white space (whether a newline or a
tab or a single space) separates the words. simple ...

But that is the point. Chars and lines are easily defined. Words
depend on the usage to be applied. Therefore the code to separate
words depends on the usage. You have to write the parsing code to
suit the job. It just isn't black and white.
 
C

CBFalconer

arnuld said:
.... snip ...

Now there is a big problem in this. In C++ i don't have to care
whether users enter one word or 100s. Memory was being managed by
std. lib. vector. Now here I am thinking of using fgets() to
store the input, which has 2 problems:

1) extract words from each line.
2) fgets() uses array top store data and I don't know how large
is the input, so I can't decide on the size of the array.

My suggestion is to use ggets, available in std. C source code at:

said:
you mean null character ?

No. I mean the char that doesn't belong to the word and signifies
the completion. It you are getting the word from a string put the
char back by backing up the pointer (or index). If coming from a
stream you have ungetc available.
 
C

CBFalconer

arnuld said:
.... snip ...

I have not checked it but will be doing it later. The only one
question that keeps on popping up into my mind is "Why C was not
designed to have this feature ? ". That reminds of an article
"Back to Basics" by Joel Spolsky where he said that we have null
terminated strings in C whihc are much slower than PASCAL
strings not by choice but by force, as C was developed on PDP-7,
which had ASCIZ table, which required strings to be Z terminated
( Z means ZERO). Do we have same kid of thing here in my problem?

Speed depends on use. Most string processing just processes until
you hit the end of the string, and there is then no slowdown from
nul termination. In addition most strings are short, and again
there is little effort in finding length. With a little care you
can often avoid finding string lengths in advance.
 
C

Chris Dollin

arnuld said:
I have not checked it but will be doing it later. The only one question
that keeps on popping up into my mind is "Why C was not designed to have
this feature ? ".

Because C was designed for /implementing/ this feature; as a bare-bones
systems programming language.
That reminds of an article "Back to Basics" by Joel
Spolsky where he said that we have null terminated strings in C whihc
are much slower than PASCAL strings

I'd be interested in real evidence for this claim. Real, as in, it
happened in these programs and couldn't be eliminated by straightforward
fixes, rather than contrived examples or beginners gotchas.
not by choice but by force, as C was
developed on PDP-7, which had ASCIZ table, which required strings to be Z
terminated ( Z means ZERO).

That seems ... unlikely ... to me. Just because one's assembler has
an ASCIZ directive doesn't mean one has to use it; even if one does,
one can perfectly well also associate a length with a string as well
as a null terminator.
Do we have same kid of thing here in my
problem ?

I am just curious and feel a little strange on having this "word problem"
in C.

You've picked a language deliberately sparse in built-in features;
don't be surprised if it doesn't have many.
 
P

Pilcrow

arnuld said:



This is a common problem - so common, in fact, that I wrote it up on the
Web. Take a look at http://www.cpax.org.uk/prg/writings/fgetdata.php which
looks at scanf, gets, and fgets, points out the difficulties with each,
and then discusses a possible solution to the problem of arbitrarily long
lines.

On that page, I present code for reading a word at a time, and for reading
a line at a time. In fact, since you supply your own delimiters, reading a
line is really just a special case of reading a word!

I do not pretend that my code is perfect. For example, the return values
could have been better chosen (I must fix that one day).

It is not intended to be a plug-in solution to the problem (although some
people do actually use it that way and, as far as I'm aware, no harm has
come to them as a result). Rather, it is intended to demonstrate one
possible approach to the problem, in the hope that the reader will have an
"aha!" moment and perhaps come up with a solution that fits his own needs
much better than a generic solution is likely to be able to do.

Several other approaches apart from the one I chose to demonstrate are also
discussed (but not demonstrated), the intent being to give a wider view of
various ways to tackle this problem, depending on your priorities.

Finally, the page provides links to a few other people's demonstrations of
how to solve this problem, again with the intent of providing a wider
perspective on different approaches.

Thank you so much! This is much more the sort of thing I was hoping to
find when I started reading this group.

I much appreciate the excellent documentation in the function itself.

Is there at least an index to other similar solutions to general
problems? In comp.lang.perl.misc one often sees people scolded for not
using tested, robust solutions, rather than reinventing the wheel. CPAN
largely fills most peoples' needs. At the risk of making myself a
complete bore, I ask again: why doesn't the C community follow this
example?

Now, if you just followed the same indenting and bracketting style that
is used in K&R2, I would be *totally* happy. I have a lot of trouble
reading yours. Nevermind, I'll just have to write a perl script to
convert from your style to theirs. Shouldn't be too hard.

Thank you again!!
 
K

Keith Thompson

Pilcrow said:
I understood that, and I am a 'beginner'. It is very adequately covered
in textbooks (see 'C in a Nutshell', ISBN 0-596-00697-7, page 440),
somewhat less so in K&R2. And I gave the questioner an example to help
him. My dissatisfaction with strtok() is that repeated separation
characters are treated as one, making it difficult to present the user
with an intuitively understandable interface. It is not usually a good
idea to equate ignorance and stupidity.

Yes, it certianly is. Did someone do that?
 
K

Keith Thompson

arnuld said:
Fore *my* program, a word is a collection of letters, numbers or anything
separated by space, tab or newline.

As you know, that definition is fine for your program; others might
have different requirements.

Incidentally, the phrase "letters, numbers, or anything" seems
redundant. I think that a more precise rendering of what you meant
would be:

A "word" is a non-empty contiguous sequence of characters other
than space, tab, or newline, preceded or followed either by a
space, tab, or newline or by the start or end of the input.

It would also be good to specify whether the input is a string, a line
of text, or an entire text file.

If I take your definition literally, then in the following
"word"
the word "word" is not a word, because it's not separated by space,
tab, or newline.

It might be more convenient to treat anything for which isspace()
returns true (or for which isspace() returns true in the "C" locale)
as a separator; that includes several whitespace characters that you
didn't mention. But of course if your requirements call for only
space, tab, and newline to be treated as separators, then that trumps
convenience.
1 word



1 word

In the previous article, "antidisestablish-" and "mentarianism" were
on two lines, so they'd be two words by your definition. (Gordon's
point was that it's reasonable to treat them as a single word, since
that's what the hyphen means in English text, but if they're two words
by your definition then they're two words by your definition.)

[snip]
Any collection of letters,symbols or numbers separated by single or
multiple spaces or tab or newline. Therefore

comp.lang.c++ --> 1 word
Std. Lib --> 2 words
Lov@389&om --> 1 word


I think it is pretty much clear now what a word is.

It's pretty much clear what your definition of a word is. It's still
not at all clear what a word is in general (and it can't be, since the
term is used inconsistently).
 
P

Pilcrow

Pilcrow said:



http://www.google.com :)



Yes, but I wouldn't.


You may well be the first person ever to say that. People have made all
kinds of complaints about my code, but readability is not usually high on
the hit-list.

I apologize. It was not really a complaint, more an expression of my
frustration.

I am still digesting that code. I was especially taken with the memory
management. It should be provided for all the other situations where
one sees the a caution that one should make sure that there is adequate
room for the result. After I have gotten more experience with C, I
think I'll try my hand at it.
 
K

Keith Thompson

Pilcrow said:
s/certianly/certainly/

How many times does someone here say, in effect, "this is too deep for a
beginner"?

That's not equating ignorance and stupidity; it's equating ignorance
and ignorance. And ignorance isn't necessarily an insult; it's
usually curable, after all.

Sorry, but some things really are too deep for a beginner.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,827
Latest member
DMUK_Beginner

Latest Threads

Top