simulation of a "wc" command

H

hiteshthappa

hi
Can anyone please help me in finding the total number of words in a
file
I get the newlines, characters and blankspaces correctly but counting
words ia problem.I have tried many ways but it didnt help.
Here is my code......
#include <string.h>
#include <stdio.h>

main(int argc , char *argv[])
{
FILE *fp;
int ch;
int chr=0;
int totchr=0;
int bspc=0,totbspc=0;
int nline=0;
int word=0,totwrd=0;
int i=0;

fp=fopen(argv[1],"r");
if(argc != 2)
{
printf("\tInsufficient arguments\n");
printf("\tusage: wrd <filename>\n");
exit(0);
}
if(fp==NULL)
{
printf("Error In File Opening\n");
exit(0);
}
else
{
while((ch=fgetc(fp))!=EOF)
{
if(ch == ' ')
{
bspc++;
}
if(ch == '\n')
{
nline++;
}
chr++;
word++;


}
}
//word = bspc + nline;
totchr += chr;
totbspc += bspc;
totwrd += word;

printf("\nFile %s has\n", argv[1]);
printf("\n\twhite spaces are: %d\n", totbspc);
printf("\twords are: %d\n", totwrd);
printf("\tcharacters are: %d\n", totchr);
printf("\tlines are: %d\n\n", nline);

fclose(fp);
}
 
A

Andrew Poelstra

hi
Can anyone please help me in finding the total number of words in a
file
I get the newlines, characters and blankspaces correctly but counting
words ia problem.I have tried many ways but it didnt help.
Here is my code......

[code snipped]

Could you repost with your code indented (two spaces is easiest to
read on Usenet, IMHO) correctly, please?

Also, if you don't need to use format specifiers, the puts() function
will print a string, automatically appending a '\n' to it, so it makes
things a bit easier to read.
 
B

Barry Schwarz

hi
Can anyone please help me in finding the total number of words in a
file
I get the newlines, characters and blankspaces correctly but counting
words ia problem.I have tried many ways but it didnt help.

Define problem. Be specific. What was your input? What was your
output? What output do you want?
Here is my code......
#include <string.h>
#include <stdio.h>

main(int argc , char *argv[])
{
FILE *fp;
int ch;
int chr=0;
int totchr=0;
int bspc=0,totbspc=0;
int nline=0;
int word=0,totwrd=0;
int i=0;

fp=fopen(argv[1],"r");
if(argc != 2)
{
printf("\tInsufficient arguments\n");
printf("\tusage: wrd <filename>\n");
exit(0);}

Obviously this test should come before the call to fopen.

Why return zero if it failed. Use EXIT_FAILURE from stdlib.h.
if(fp==NULL)
{
printf("Error In File Opening\n");
exit(0);}

else
{
while((ch=fgetc(fp))!=EOF)
{
if(ch == ' ')
{
bspc++;}

if(ch == '\n')
{
nline++;}

chr++;
word++;

Why are you incrementing word for every character. You should
increment it only if the current character is white space (see isspace
in your reference).

A consistent indenting style will save you a lot of time in your
programming efforts.
//word = bspc + nline;
totchr += chr;
totbspc += bspc;
totwrd += word;

Are the left side operands ever non-zero?
printf("\nFile %s has\n", argv[1]);
printf("\n\twhite spaces are: %d\n", totbspc);
printf("\twords are: %d\n", totwrd);
printf("\tcharacters are: %d\n", totchr);
printf("\tlines are: %d\n\n", nline);

fclose(fp);



}
 
B

Bill Reid

Pietro Cerutti said:
(e-mail address removed) wrote:
It's most about your definition of "words" and "white spaces". The unix
utility wc refers to a word as a string of characters delimited by a
blank space. If that's the behavior that you're trying to mimic, then
totbspc may be a good approximation of the number of words in your file.

How about something better than an "approximation"...
Some error checking after fgetc returns EOF to see whether an
end-of-file event or an error occurred may be needed at the end.

OK, no assurances this is the "best" way to do this, but here is
how I do it:

char *find_text_field(char *curr_char) {

for(;isspace(*curr_char)!=0;curr_char++);

return curr_char;
}

char *find_next_text_field(char *curr_char) {

for(;isspace(*curr_char)!=0;curr_char++);

for(;*curr_char!='\0';curr_char++)
if(isspace(*curr_char)!=0) break;

for(;isspace(*curr_char)!=0;curr_char++);

return curr_char;
}

unsigned count_text_words(char *text) {
unsigned num_word=0;
char *curr_char;

if(*(curr_char=find_text_field(text))!='\0')
while(*curr_char!='\0') {

num_word++;

if(*(curr_char=find_next_text_field(curr_char))=='\0')
break;
}

return num_word;
}

Now this is assuming you've read the file into a text buffer first, and
you can use it that way, or possibly modify the logic to work with a
text file stream instead (I didn't look in my file utilities library,
because
I'm not sure I really need/use a "word counter" for files, but note
that you are essentially just reading through the text character by
character, so you could just use fgetc() and check for EOF
rather than '\0' throughout)...
 
V

vippstar

Pietro Cerutti <gahr_SPAM_gahr_ME_ch> wrote in message



How about something better than an "approximation"...


OK, no assurances this is the "best" way to do this, but here is
how I do it:

char *find_text_field(char *curr_char) {

for(;isspace(*curr_char)!=0;curr_char++);

Undefined behavior.
Cast *curr_char to (unsigned char).

<snip>
 
A

Andrew Poelstra

Undefined behavior.
Cast *curr_char to (unsigned char).

Not necessarily. If you are sure that the value of *curr_char
will be within the range of unsigned char (or char is unsigned
by default!) the behavior is defined.

Having said that, your advice is still good advice; just not
strictly necessary if you check the input to find_text_field()
carefully enough. :)
 
V

vippstar

Not necessarily. If you are sure that the value of *curr_char
will be within the range of unsigned char (or char is unsigned
by default!) the behavior is defined.

Whether char is unsigned or signed is unspecified.
How do you suggest to "check" the value of *curr_char?

Regardless of what you suggest, it _is_ undefined behavior, in his
code. In *your* code with *your* checks/guarantees, it might not be.

(for example, char c = someval; assert(c >= 0); isspace(c); is not UB)
 
H

Harald van Dijk

Whether char is unsigned or signed is unspecified.

It's implementation-defined.
How do you suggest to
"check" the value of *curr_char?

One possibility is reading the implementation's documentation. Another is
calling find_text_field(" hello "); or any other string consisting
only of characters in the basic execution character set.
Regardless of what you suggest, it _is_ undefined behavior, in his code.
In *your* code with *your* checks/guarantees, it might not be.

(for example, char c = someval; assert(c >= 0); isspace(c); is not UB)

If c is within the range of unsigned char, the behaviour of isspace(c) is
specified by the standard regardless of whether you code a check to verify
its value.
 
V

vippstar

It's implementation-defined.

3.4.1
1 implementation-deï¬ned behavior
unspeciï¬ed behavior where each implementation documents how the
choice is made

We seem to agree.
One possibility is reading the implementation's documentation. Another is

Reading the implementations documentation? Why would you do such thing
when you can simply write valid C code that doesn't rely on
implementation documentation?
calling find_text_field(" hello "); or any other string consisting
only of characters in the basic execution character set.

I don't see how this checks for anything (it's a guarantee that
characters in the basic execution set have a value > 0)
If c is within the range of unsigned char, the behaviour of isspace(c) is
specified by the standard regardless of whether you code a check to verify
its value.

So? Have I said otherwise? (I actually implied exactly _that_ when I
suggested the (unsigned char) cast)
 
H

Harald van Dijk

On Aug 28, 10:51 pm, Andrew Poelstra <[email protected]>
wrote:
On Aug 28, 10:07 pm, "Bill Reid"
char *find_text_field(char *curr_char) {

Undefined behavior.
Cast *curr_char to (unsigned char).
Not necessarily. If you are sure that the value of *curr_char will
be within the range of unsigned char (or char is unsigned by
default!) the behavior is defined.
Whether char is unsigned or signed is unspecified.
[...]
How do you suggest to
"check" the value of *curr_char?

One possibility is reading the implementation's documentation. Another
is

Reading the implementations documentation? Why would you do such thing
when you can simply write valid C code that doesn't rely on
implementation documentation?

Because you've already read it for other reasons? Because you mistook the
compiler's documentation as a guarantee that char is always unsigned, on
every compiler? It doesn't need to be a good idea to have people do it.
I don't see how this checks for anything (it's a guarantee that
characters in the basic execution set have a value > 0)

It took the liberty of considering a verification by the programmer that
each character is in fact in the basic execution character set as a check.
If you don't approve, then I don't see the point of your question

How do you suggest to "check" the value of *curr_char?

since no check is required.
So? Have I said otherwise? (I actually implied exactly _that_ when I
suggested the (unsigned char) cast)

Yes. I take an unqualified "Undefined behavior." as saying the behaviour
is undefined, not that the behaviour may or may not be undefined. More
clearly, you also claimed rather explicitly "Regardless of what you
suggest, it _is_ undefined behavior, in his code."
 
V

vippstar

51 pm, Andrew Poelstra <[email protected]>
wrote:
On Aug 28, 10:07 pm, "Bill Reid"
char *find_text_field(char *curr_char) {
for(;isspace(*curr_char)!=0;curr_char++);
Undefined behavior.
Cast *curr_char to (unsigned char).
Not necessarily. If you are sure that the value of *curr_char will
be within the range of unsigned char (or char is unsigned by
default!) the behavior is defined.
Whether char is unsigned or signed is unspecified.
[...]
How do you suggest to
"check" the value of *curr_char?
One possibility is reading the implementation's documentation. Another
is
Reading the implementations documentation? Why would you do such thing
when you can simply write valid C code that doesn't rely on
implementation documentation?

Because you've already read it for other reasons? Because you mistook the
compiler's documentation as a guarantee that char is always unsigned, on
every compiler? It doesn't need to be a good idea to have people do it.

I asked how you'd check *curr_char in C.
You reply with reading the implementations documentation. It's not a
logical answer.
(indeed, I did not explicity said "in C", but it was implied, I think)
It took the liberty of considering a verification by the programmer that
each character is in fact in the basic execution character set as a check..
If you don't approve, then I don't see the point of your question

How do you suggest to "check" the value of *curr_char?

since no check is required.

No check is required. Then mr Poelstra did not have a point, not me.
It was him who suggested "checking" the value of *curr_char.
Yes. I take an unqualified "Undefined behavior." as saying the behaviour
is undefined, not that the behaviour may or may not be undefined. More

When you rely on implementation-defined behavior, that one of the
behaviors is undefined, you're invoking undefined behavior.
clearly, you also claimed rather explicitly "Regardless of what you
suggest, it _is_ undefined behavior, in his code."

Yes I did, and I was wrong. I did not read the whole code, I only
assumed he used that function in the input of some file stream.
His code did not do such thing, so he is not invoking undefined
behavior. (at least not there)

So yes, I do see your point now. (I was replying to your post as I was
reading it)
 
H

Harald van Dijk

51 pm, Andrew Poelstra <[email protected]>
wrote:
On Aug 28, 10:07 pm, "Bill Reid"
char *find_text_field(char *curr_char) {

Undefined behavior.
Cast *curr_char to (unsigned char).
Not necessarily. If you are sure that the value of *curr_char
will be within the range of unsigned char (or char is unsigned by
default!) the behavior is defined.
Whether char is unsigned or signed is unspecified.
[...]
How do you suggest to
"check" the value of *curr_char?
[...snip...]

It was [mr Poelstra] who suggested "checking" the value of *curr_char.

Ah! Thanks for clearing that up. There's something missing in the quoted
material. In the quote, he says you need to be sure that the value of
*curr_char will be within the range of unsigned char, not that you need to
check it. Now that I've looked up the message, I see your point a bit
better.

(No comment on the rest of your message right now.)
 
B

Bill Reid

Undefined behavior.
Cast *curr_char to (unsigned char).

Tee-hee...this old thing from a few weeks ago...

You know if I WAS a troll, not including that SUPER-IMPORTANT!!!!
cast (which appears to be totally unnecessary on my "implementation")
would constitute the PERFECT troll, since it has now spawned a whole
raft of argumentative replies while I just enjoyed another afternoon in
paradise...
 
V

vippstar

Tee-hee...this old thing from a few weeks ago...

You know if I WAS a troll, not including that SUPER-IMPORTANT!!!!
cast (which appears to be totally unnecessary on my "implementation")
would constitute the PERFECT troll, since it has now spawned a whole
raft of argumentative replies while I just enjoyed another afternoon in
paradise...

Actually, I'm not doubting that you are a troll, because I know you
are one. (thus why I didn't bother reading the rest of your code)
I'm glad I posted my post though; the "spawn of argumentative replies"
was quite informative for me.
Perhaps it was informative for the rest of those who participated, and
perhaps it will be informative to some random visitor. (search engines
commonly direct people to web mirrors of usenet posts)

Lastly, if you think 20 or so replies constitute of the perfect troll,
you need to read some scott nudds ;-)
 
S

s.dhilipkumar

Hi

I have never done this excersice, this is just my version of
code. :) i have just checked few basic conditions and it imitates wc
to an extent.

#include<stdio.h>

char find_char(char x)
{
if( '\n' == x)
return 1;
if( ' ' == x)
return 2;
else
return 3;
}




int main(int argc, char* argv[])
{
FILE *fp=NULL;
char prev=0,ch,ctyp;
int wc=0,nl=0,ws=0;
if (argc != 2)
{
printf ("invlid argument \n");
return 1;
}
fp=fopen(argv[1], "r");
if (fp == NULL)
{
printf("Unable to open the file %s\n",argv[1]);
return 1;
}
while(!feof(fp))
{
ch = fgetc(fp);
ctyp = find_char(ch);
if(ctyp == 1)
{
if ( prev == 3 )
wc++;nl++;
}
else if(ctyp == 2)
{
if(prev == 3)
wc++;
ws++;
}
prev = ctyp;

}

printf("Final Newline = %d Wite space = %d Word Count = %d\n",
nl,ws,wc);
return 0;
}

Sample Input:
line 1
line 2
line 3
line 4
end of line

Sample output:
../my_wc sample.txt
Final Newline = 5 Wite space = 12 Word Count = 11

Expected output
wc sample.txt
5 11 46 sample.txt

Regards,
Dhilip
 
A

Andrew Poelstra

Hi

I have never done this excersice, this is just my version of
code. :) i have just checked few basic conditions and it imitates wc
to an extent.

#include<stdio.h>

char find_char(char x)
{
if( '\n' == x)
return 1;
if( ' ' == x)
return 2;
else
return 3;
}

This function can probably be replaced by the standard function
isspace() given in said:
int main(int argc, char* argv[])
{
FILE *fp=NULL;
char prev=0,ch,ctyp;
int wc=0,nl=0,ws=0;
if (argc != 2)
{
printf ("invlid argument \n");
return 1;
}

1 is not guaranteed to be a valid return value from main(). Use
EXIT_FAILURE from said:
fp=fopen(argv[1], "r");
if (fp == NULL)
{
printf("Unable to open the file %s\n",argv[1]);
return 1;
}
while(!feof(fp))

Uh-oh. This will cause a fencepost error - instead, check ch
against EOF (and define it as an int, not a char, to hold this
value) and use feof() to confirm that it really was end-of-file,
not another error.
{
ch = fgetc(fp);
ctyp = find_char(ch);
if(ctyp == 1)

Yuck. Better to explicitly compare ch against ' ' or '\n',
whichever one you meant. Magic numbers are Bad News.
{
if ( prev == 3 )
wc++;nl++;
}
else if(ctyp == 2)
{
if(prev == 3)
wc++;
ws++;
}

This logic looks like I could replace it with the simpler:

if(isspace(ch))
++wc;

while(isspace(ch))
{
if(ch == '\n')
++nl;
++ws;
ch = fgetc(fp);
}

which is clearer IMHO.


Also, don't top-post. Your reply belongs below the text.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,828
Latest member
LauraCastr

Latest Threads

Top