Stroustrup section 1.5.4, word counting

A

arnuld

this is an example programme that counts lines, words and characters.
i have noticed one thing that this programme counts space, a newline
and a tab as a character.

i know:

1. a newline is represented as '\n'
2. a tab as '\t'
3. a space as ' '

what i want to know is whether a newline, a space and a tab are
represented internally as characters ?

i know everything is represented as machine's character set, most
probably ASCII where 'A' is 65 but i am actually confused on this
'\t', '\n' , ' ', and character issue.

any help

here is the code that counts characters,words,tabs and newlines:

// word counting


#include <stdio.h>

#define IN 0
#define OUT 1

int main(void) {
int c, nl, nw, nc, state;

state = OUT;
nl = nc = nw = 0;

while((c = getchar()) != EOF)
{
++nc;

if (c == '\n')
++nl;

if( c == ' ' || c == '\n' || c == '\t')
state = OUT;

else if (state == OUT)
{
state = IN;
++ nw;
}
}

printf("%d NEWLINES \t %d WORDS \t %d CHARs \n", nl, nw, nc);

return 0;
}
 
S

santosh

arnuld said:
this is an example programme that counts lines, words and characters.
i have noticed one thing that this programme counts space, a newline
and a tab as a character.

i know:

1. a newline is represented as '\n'
2. a tab as '\t'
3. a space as ' '

what i want to know is whether a newline, a space and a tab are
represented internally as characters ?

It depends on the machine and it's character set.
i know everything is represented as machine's character set, most
probably ASCII where 'A' is 65 but i am actually confused on this
'\t', '\n' , ' ', and character issue.

any help

Generally end-of-line sequence is represented by one or two
characters. Under UNIX it's a single linefeed character, while under
DOS-like systems it's a carriage-return followed by a linefeed. MacOS
used to use a single carriage-return. Doubtless other systems may use
more variations.

Spaces and tabs are usually represented by one character.
here is the code that counts characters,words,tabs and newlines:

// word counting

It's better to use /* ... */ style comments, especially when you're
posting code onto Usenet.
 
K

Keith Thompson

santosh said:
It depends on the machine and it's character set.


Generally end-of-line sequence is represented by one or two
characters. Under UNIX it's a single linefeed character, while under
DOS-like systems it's a carriage-return followed by a linefeed. MacOS
used to use a single carriage-return. Doubtless other systems may use
more variations.
[...]

But C's I/O routines, when operating on files opened in text mode,
hide those details for you. Regardless of how an end-of-line is
represented in an external file (and there are a *lot* of ways to do
this, including fixed-length records with no specific marker), it's
mapped to a single '\n' character.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top