input word

A

arnuld

We have discussed this program a lot. Only two problems have remained with
this code:



/* a function written in ANSI C, that takes a word from stdin and manages the memory dynamically
*
* Since there is no generally agreed definition of what a 'word' is. I have taken a very simple
* approach:
*
* A word is a contiguous collection of non-whitespace characters. A whitespace means anyone of
* a single apace, a newline or a tab.
*
*
*/


#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>


enum { WORD_SIZE = 28 };
enum { GSW_OK, GSW_ENOMEM, GSW_ENORESIZE } ;


int get_single_word( char** );



int main( void )
{
char* pword;


while( (GSW_OK == get_single_word(&pword)) && ( *pword != 0) )
{
printf("You entered: [%s]\n", pword);
free( pword );
}


return 0;
}




int get_single_word( char** ppc )
{
unsigned ele_num = 0;
int ch;
size_t word_length = WORD_SIZE;
size_t word_length_interval = 2;

char* word_begin;
char* new_mem;

*ppc = malloc(word_length * sizeof(**ppc));
word_begin = *ppc;


if( NULL == ppc )
{
return GSW_ENOMEM;

}


while( (EOF != (ch = getchar())) && isspace(ch) )
{
continue; /* Leading whitespace */
}


if( EOF != ch )
{
*word_begin++ = ch;
ele_num = 1;
}


while( (EOF != (ch = getchar())) && (! isspace(ch)) )
{
if( (word_length - 1) == ele_num++ )
{
new_mem = realloc( *ppc, (word_length_interval * word_length * sizeof *new_mem) );

if( new_mem )
{
word_begin = new_mem + (word_begin - *ppc);
word_length *= word_length_interval;
*ppc = new_mem;
}
else
{
*word_begin = '\0';
return GSW_ENORESIZE;
}
}

*word_begin++ = ch;
}


*word_begin = '\0';

return GSW_OK;
}





1) the get_single_word program is 60 lines long when there is
general agreement that no function should be longer than 40
lines.

2) distinguishing between real EOF and the error.




For (1) all I can think of is to take while() loop (from
get_single_word) out into a new function which will require passing 6
variables as pointer (or pointer to pointer) arguments: new_mem,
word_begin, word_length, word_length_interval, ppc and ele_num. I think
that will be messy solution.


Regarding (2) I have to use feof() and ferror() which are little
confusing.
 
A

arnuld

We have discussed this program a lot. Only two problems have remained with
this code:
..SNIP...


I have tried it but it Segfaults:


#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>


enum { WORD_SIZE = 28 };
enum { GSW_OK, GSW_ENOMEM, GSW_ENORESIZE } ;


int get_single_word( char** );
void save_word( char**, size_t*, size_t*);


int main( void )
{
char* pword;


while( (GSW_OK == get_single_word(&pword)) && ( *pword != 0) )
{
printf("You entered: [%s]\n", pword);
free( pword );
}


return 0;
}




int get_single_word( char** ppc )
{
unsigned ele_num = 0;
int ch;
size_t word_length = WORD_SIZE;

char* word_begin;

*ppc = malloc(word_length * sizeof(**ppc));
word_begin = *ppc;


if( NULL == ppc )
{
return GSW_ENOMEM;

}


while( (EOF != (ch = getchar())) && isspace(ch) )
{
continue; /* Leading whitespace */
}


if( EOF != ch )
{
*word_begin++ = ch;
ele_num = 1;
}


save_word( &word_begin, &ele_num, &word_length );


*word_begin = '\0';

return GSW_OK;
}




void save_word( char** word_begin, size_t* ele_num, size_t* word_length )
{
int ch = EOF;
size_t word_length_interval = 2;
char* new_mem = NULL;

char* pc = *word_begin;


while( (EOF != (ch = getchar())) && (! isspace(ch)) )
{
if( (*word_length - 1) == (*ele_num)++ )
{
new_mem = realloc( pc, (word_length_interval * (*word_length) * sizeof *new_mem) );

if( new_mem )
{
*word_begin = new_mem + (*word_begin - pc);
(*word_length) *= word_length_interval;
pc = new_mem;
}
else
{
*word_begin = '\0';
return;
}
}

**word_begin = ch;
*word_begin++;

}
}
 
A

arnuld

Have you checked with a debugger to see why?


I thought using a Debugger (for such small programs) kills one's
understanding of language, thats why I never used any debugger.
 
J

Joachim Schmitz

arnuld said:
I thought using a Debugger (for such small programs) kills one's
understanding of language, thats why I never used any debugger.

do you understand why your program fails without using debugger? Do you even
know where it fails?

Bye, Jojo
 
I

Ian Collins

arnuld said:
I thought using a Debugger (for such small programs) kills one's
understanding of language, thats why I never used any debugger.
Which do you prefer, stumbling around in the dark, or turning on the light?
 
A

arnuld

do you understand why your program fails without using debugger?

yes, some problem with the pointer passed as first argument. It is
pointing somewhere out of bounds.

Do you even know where it fails?

yes, I guess so. **word_begin++ = ch; (in save_word fucntion)


it does not go in to the while loop. It never goes in there till I
enter more than 1 character.
 
A

arnuld

Which do you prefer, stumbling around in the dark, or turning on the light?

Using debugger means shutting down your brain. Thats what I think because
If I can't understand what I wrote in just 24 lines, I better throw the
code and start again.
 
B

Bartc

1) the get_single_word program is 60 lines long when there is
general agreement that no function should be longer than 40
lines.

Sounds like nonsense to me.

Some functions perhaps could do with breaking up, but it all depends.

I've just been working on one with 3000 lines.

60 lines will anyway fit nicely printed on one page.
 
R

Richard Bos

arnuld said:
Using debugger means shutting down your brain. Thats what I think because
If I can't understand what I wrote in just 24 lines, I better throw the
code and start again.

This is quite true, but it does require that you start from the
beginning _correctly_. Clear the room, _then_ start again in daylight.
Don't continue to grope around half-understanding. IOW, first formulate
_what_ you want to do, then _how_ you think you're going to do it, and
only then start coding. And check your code _while_ you're coding, don't
write it all in a lump and expect to be able to sort the bugs out
afterwards.

Richard
 
R

Richard Tobin

arnuld said:
1) the get_single_word program is 60 lines long when there is
general agreement that no function should be longer than 40
lines.

No there isn't.

-- Richard
 
C

Chris Dollin

Bartc said:
Sounds like nonsense to me.

Yes, 40 lines is too long.
Some functions perhaps could do with breaking up, but it all depends.

Of course. But just because it all depends doesn't mean that
long functions are good.
I've just been working on one with 3000 lines.

/That/ is too long, absent very Special Circumstances.
60 lines will anyway fit nicely printed on one page.

What is this "page" of which you speak?
 
B

Bartc

Chris Dollin said:
Yes, 40 lines is too long.

What would code look like if you imposed a limit of perhaps 6 lines per
function?

Have you ever tried reading a web document, by itself fairly short, but
presented a paragraph per page (and buried amongst advertising)? It's
exasperating. Or an online book where you have to navigate by hyperlinks.
You quickly get fed up.
Of course. But just because it all depends doesn't mean that
long functions are good.


/That/ is too long, absent very Special Circumstances.

Well, that one was mostly a large 'switch' with some 200 cases. Maybe a set
of functions would have been better, but speed was critical here. The magic
of suitable comments and editor help ensured that each case was as readable
as an independent function.
What is this "page" of which you speak?

Last time I printed out any code, a printer could put 66 lines on one page.
With the right font and using A4 paper, that's probably still about right.
 
C

Chris Dollin

Bartc said:
What would code look like if you imposed a limit of perhaps 6 lines per
function?

Mine.

(Maybe 10 rather than 6.)
Have you ever tried reading a web document, by itself fairly short, but
presented a paragraph per page (and buried amongst advertising)? It's
exasperating. Or an online book where you have to navigate by hyperlinks.
You quickly get fed up.

The paragraphs don't have names that are supposed to be meaningful
to people. Code doesn't have advertising in it. Code is denser than
text. Code has structure whereby things closer tend to have more
immediate relevance. You don't have to flip pages at function
boundaries.
Well, that one was mostly a large 'switch' with some 200 cases.

That counts as SC; my biggest C function was also a switch, some 800
lines before macro-expansion.
Last time I printed out any code, a printer could put 66 lines on one page.
With the right font and using A4 paper, that's probably still about right.

I've discovered that whenever I print out code, I look at it
for five minutes, write some red [1] lines [2] on it, go back
to the computer, and ignore the paper thereafter. So "pages"
aren't interesting to me as a measure of code size [3].

[1] Depending on the pen to hand.

[2] Some of the lines are supposed to be words.

[3] More as a contribution to profit ...
 
L

Lew Pitcher

We have discussed this program a lot. Only two problems have remained with
this code: [snip]
1) the get_single_word program is 60 lines long when there is
general agreement that no function should be longer than 40
lines.

"General agreement"? Sorry, but no.

However, I'll buy it as "development standard" in your shop (assuming
that
this is code written for a company, and the code-writing group has
standards wrt function body length).

/If/ you want to reduce the length of the get_single_word() *function*
(not
*program*, unless you've not given us all the information), there are
several techniques that you can use:

1) Reduce statements to their most compact form
For example, the 5 line
if ( NULL == ppc )
{
return GSW_ENOMEM;

}
would become the one line
if (NULL == ppc ) return GSW_ENOMEM;
eliminating four lines, and the 4 line
while( (EOF != (ch = getchar())) && isspace(ch) )
{
continue; /* Leading whitespace */
}
would lose three lines to become
while( (EOF != (ch = getchar())) && isspace(ch) ) ;

2) Combine commonalities into single elements. The two while() loops
have some common test elements; you could combine the two loops
into
one, and distinguish the differences within the loop body

3) Create new functions from old logic. Your get_single_word()
function
works in four phases:
1 - sanity check,
2 - skip leading whitespace,
3 - empty word handler,
4 - word string handler
Break one or more of these out into it's own function, and modify
get_single_word() to utilize the new function instead of its own
inline code. A good candidate might be the word string handler
logic.

2) distinguishing between real EOF and the error.

To what purpose? Would you handle a getchar() error different from a
getchar() EOF? I suppose that you /could/ change your enum from
enum { GSW_OK, GSW_ENOMEM, GSW_ENORESIZE};
to
enum { GSW_OK, GSW_ENOMEM, GSW_ENORESIZE, GSW_ESTDIO_ERROR};
and your final
return GSW_OK;
to
if (feof(stdin))
return GSW_OK;
else
return GSW_ESTDIO_ERROR;
but how would your main() function logic change when it encounters a
GSW_ESTDIO_ERROR? From what I can see, your main() only looks for a
GSW_OK; /all/ other returns cause a silent program termination.

[snip]

HTH
 
I

Ian Collins

arnuld said:
Using debugger means shutting down your brain. Thats what I think because
If I can't understand what I wrote in just 24 lines, I better throw the
code and start again.
Then test as you go. Write a simple test, write the code to pass it,
check in. Repeat until done.

Then if you add come code that breaks, just revert the change and do it
again.
 
I

Ian Collins

Chris said:
Mine.

(Maybe 10 rather than 6.)

Also mine, I aim for about 10 lines per function.

120 lines fit on one screen, but I hardly ever create functions that long.
 
C

CBFalconer

Lew said:
arnuld said:
We have discussed this program a lot. Only two problems have
remained with this code:
[snip]

1) the get_single_word program is 60 lines long when there
is general agreement that no function should be longer
than 40 lines.

"General agreement"? Sorry, but no.

However, I'll buy it as "development standard" in your shop
(assuming that this is code written for a company, and the
code-writing group has standards wrt function body length).

This thread is missing the major purpose of limiting function
length. The point is to make things readable and understandable to
the poor programmer, especially the one who is fixing the code five
years later. Which may be you.

Humans can normally handle something up to seven items. So the
routine should have no more than that many parameters and local
variables. Larger things can be handled by calling an
appropriately named subroutine. This combines quite nicely with
the objective of keeping the source of a routine small enough to
fit on a single CRT screen.

I can usually easily read code I wrote thirty years ago, because I
have adhered to this principle [1]. Forty year old code is another
matter, because I hadn't yet learned the key facts. Forty-five is
even worse. The age matters, but not the language.

[1] Or should that be principal?
 
A

Antoninus Twink

This combines quite nicely with the objective of keeping the source of
a routine small enough to fit on a single CRT screen.

Another glimpse into CBF's stone-age existence. How many clc'ers really
still use a CRT screen, I wonder?
 
B

Barry Schwarz

We have discussed this program a lot. Only two problems have remained with
this code:



/* a function written in ANSI C, that takes a word from stdin and manages the memory dynamically
*
* Since there is no generally agreed definition of what a 'word' is. I have taken a very simple
* approach:
*
* A word is a contiguous collection of non-whitespace characters. A whitespace means anyone of
* a single apace, a newline or a tab.

You use isspace to detect whitespace. isspace will return true for
several other characters besides these three.
*
*
*/


#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>


enum { WORD_SIZE = 28 };
enum { GSW_OK, GSW_ENOMEM, GSW_ENORESIZE } ;


int get_single_word( char** );



int main( void )
{
char* pword;


while( (GSW_OK == get_single_word(&pword)) && ( *pword != 0) )
{
printf("You entered: [%s]\n", pword);
free( pword );
}


return 0;
}




int get_single_word( char** ppc )
{
unsigned ele_num = 0;
int ch;
size_t word_length = WORD_SIZE;
size_t word_length_interval = 2;

char* word_begin;
char* new_mem;

*ppc = malloc(word_length * sizeof(**ppc));
word_begin = *ppc;


if( NULL == ppc )

Surely this has been mentioned before. ppc contains the address of
pword in main. It can never be NULL. You probably meant *ppc.
{
return GSW_ENOMEM;

And I know your excessive vertical white space has been discussed.
}


while( (EOF != (ch = getchar())) && isspace(ch) )
{
continue; /* Leading whitespace */
}


if( EOF != ch )
{
*word_begin++ = ch;
ele_num = 1;
}


while( (EOF != (ch = getchar())) && (! isspace(ch)) )
{
if( (word_length - 1) == ele_num++ )
{
new_mem = realloc( *ppc, (word_length_interval * word_length * sizeof *new_mem) );

if( new_mem )
{
word_begin = new_mem + (word_begin - *ppc);
word_length *= word_length_interval;
*ppc = new_mem;
}
else
{
*word_begin = '\0';
return GSW_ENORESIZE;
}
}

*word_begin++ = ch;
}


*word_begin = '\0';

return GSW_OK;
}





1) the get_single_word program is 60 lines long when there is

63 actually
general agreement that no function should be longer than 40

Only among mental midgets.
lines.

2) distinguishing between real EOF and the error.




For (1) all I can think of is to take while() loop (from

Vertical white space is a matter of style but, to be blunt, yours
sucks. I count at least 9 lines of inappropriate or excessive
spacing. That would get you down to 54 lines which would make the
entire block of executable code visible on the screen at once on my
monitor (a much more useful measure than an arbitrary line count).
get_single_word) out into a new function which will require passing 6
variables as pointer (or pointer to pointer) arguments: new_mem,
word_begin, word_length, word_length_interval, ppc and ele_num. I think
that will be messy solution.

A function should perform a single task. It should be as large as it
needs to be to perform that task, but no larger. If the task is
unacceptably large (tm) or excessively complicated (tm), then it
should be broken into subtasks and coded accordingly.

While neither condition applies in this case, for practice you could
attempt to compartmentalize some aspect of g_s_w. As an example, you
could move the 16 lines of code that deal with reallocating the space
*ppc points to into a separate function. It would require only three
arguments (the current value of *ppc, the address of word length, and
the address of new_mem).
Regarding (2) I have to use feof() and ferror() which are little
confusing.

They are mutually exclusive in this case so you only need to use one
or the other. If the one you use returns 1, then that condition is
true. (E.g., if you use feof and it returns 1, then you reached end
of file and don't have an error). If the one you use returns 0, then
the other condition is true.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,816
Latest member
SapanaCarpetStudio

Latest Threads

Top