arnuld said:
C takes input character by character. I did not find any Standard
Library function that can take a word as input. So I want to write one
of my own to be used with "Self Referential Structures" of section 6.5
of K&R2. K&R2
has their own version of <getword> which, I think, is quite different
from what I need:
<getword> will have following properties:
1.) If the word contains any number like "beauty1" or "win2e" it will
discard it, K&R2's <getword> does not. My <getword> will only take
pure-words like "beauty", "wine" etc.
What about words with other characters like hyphen? What about
constructs like "get_name"? Will you discard them too. What about words
that end with a ; or ...? What about words that contain symbols like #@
etc? Or words that end with an exclamation mark? Or words within
parenthesis or braces?
Just giving you some food for thought as to what exactly you are going
to consider a word and what you will reject. This can be far trickier
than one first imagines.
2.) we can store each word by using <array of pointers> pointing to
those words and since words themselves are strings, which in
reality, are <arrays of chars>, so we will have <array of pointers> to
those <arrays of chars>.
That's one way yes, suitable when you don't know the lengths of words in
advance, or you don't want to possibly waste storage with statically
allocated arrays.
or you think using a 2D array is a better idea ?
Depends on your requirements really, and the type and frequency of input
you expect. Will you put an upper limit on the length of words? It
hardly makes sense to accept words longer than about 64 characters if
you are dealing with normal English text. Static 2D arrays are
undoubtedly easier to work with but are less flexible than dynamically
allocated arrays. Since statically allocated arrays are of fixed size
it's possible for some elements to remain unused and hence wasted. OTOH
a large number of small allocations may lead to memory fragmentation
and also some wastage due to malloc bookeeping and possibly also a
slowdown in speed if you'll be reading a very large number of words
from a file. For input from a human it will not matter.
One efficient method is to use a single dynamically allocated array in
which words are stored sequentially. The length of each word could be
specified by either one or two bytes prefixing the word itself. This
results in very efficient storage, but is grossly inefficient if you
want to insert and delete words at random. For this a hash table based
approach is probably the best. OTOH a tree is very convenient for quick
searching and sorting.
If you tell us more details about the type and volume of input you
expect and the facilities (like searching, insertion, etc.) you plan to
implement, perhaps a tailored approach can be suggested.