K&R2 1-23

  • Thread starter Merrill & Michele
  • Start date
M

Merrill & Michele

The exercise is to remove all comments from a source file. What I've
written so far is:

/* decommentizer: removes this style of comment from source */
#include <stdio.h>

void strconst(char);
void quotestr(char);
void decomment(void);

int main(void)
{
int c;
while((c = getchar())) != EOF)
{

I'm not sure if I've thought correctly through the cases. For example, I'm
reasonably certain that the function that handles quoted strings will simply
putchar() until it hits another " . I've got myself talked into the same
being the case with character constants. If so, I could reduce the number
of prototype declarations above and either pass " or ' and have the called
function do the same thing. My other question is what other ISO-conforming,
syntactically-legal things can follow a / besides an * . I don't believe
that / is an answer to this question. MPJ
 
M

Michael Mair

Merrill said:
The exercise is to remove all comments from a source file. What I've
written so far is:

/* decommentizer: removes this style of comment from source */
#include <stdio.h>

void strconst(char);
void quotestr(char);
void decomment(void);

int main(void)
{
int c;
while((c = getchar())) != EOF)
{
/*
** Code doing the work.
*/
}
}

Does not cost much and is compilable :)

I'm not sure if I've thought correctly through the cases. For example, I'm
reasonably certain that the function that handles quoted strings will simply
putchar() until it hits another " . I've got myself talked into the same
being the case with character constants. If so, I could reduce the number
of prototype declarations above and either pass " or ' and have the called
function do the same thing. My other question is what other ISO-conforming,
syntactically-legal things can follow a / besides an * . I don't believe
that / is an answer to this question. MPJ

If you restrict yourself to the exercise, there are no C++/C99 line
comments (//).

Your cases have also to take care of things like not "leaving" a string
literal on \" but on \\", for example (expansion to/checking for other
cases left to you :)).

Hint: Introduce a variable indicating the state (outside/in comment/in
string/in character constant). (You seem to have planned doing that but
did not explicitly mention it.)


Cheers
Michael
 
C

Chris Croughton

I'm not sure if I've thought correctly through the cases. For example, I'm
reasonably certain that the function that handles quoted strings will simply
putchar() until it hits another " .

Escaped quotes in strings and character constants?
I've got myself talked into the same
being the case with character constants. If so, I could reduce the number
of prototype declarations above and either pass " or ' and have the called
function do the same thing.

Yes, I often do that, pass the start character to be used as the end
character and treat strings and character constants the same otherwise.
My other question is what other ISO-conforming,
syntactically-legal things can follow a / besides an * . I don't believe
that / is an answer to this question. MPJ

Believe it, it's what the ISO C spec. says. As worded, of course, the
answer is "almost anything" (a = b/-c; for example). But what I think
you mean is "What can follow / as a valid token?", and the answer to
that is * or /. And don't forget line joining (\ followed by newline)
in line comments started with //.

Chris C
 
M

Merrill & Michele

"Michael Mair"

/*
** Code doing the work.
*/
}
}

Does not cost much and is compilable :)
I'm glad I checked at this point for wrong-headed thinking insttead of
coding away, as it saved the embarrassment and you the "ugghhh is this guy
ever going to learn?"
If you restrict yourself to the exercise, there are no C++/C99 line
comments (//).

Your cases have also to take care of things like not "leaving" a string
literal on \" but on \\", for example (expansion to/checking for other
cases left to you :)).

Hint: Introduce a variable indicating the state (outside/in comment/in
string/in character constant). (You seem to have planned doing that but
did not explicitly mention it.)
I didn't. I had this silly logic edifice instead. Back to work. MPJ
 
M

Merrill & Michele

"Chris Croughton"
Escaped quotes in strings and character constants?

I think I've got this covered.
Yes, I often do that, pass the start character to be used as the end
character and treat strings and character constants the same otherwise.


Believe it, it's what the ISO C spec. says. As worded, of course, the
answer is "almost anything" (a = b/-c; for example). But what I think
you mean is "What can follow / as a valid token?", and the answer to
that is * or /. And don't forget line joining (\ followed by newline)
in line comments started with //.

Thanks for replying. Let me come up with some code before a C89-ISO
fisticuffs breaks out. MPJ
 
C

CBFalconer

Merrill said:
The exercise is to remove all comments from a source file. What I've
written so far is:

/* decommentizer: removes this style of comment from source */
#include <stdio.h>

void strconst(char);
void quotestr(char);
void decomment(void);

int main(void)
{
int c;
while((c = getchar())) != EOF)
{

I'm not sure if I've thought correctly through the cases. For example,
I'm reasonably certain that the function that handles quoted strings
will simply putchar() until it hits another " . I've got myself talked
into the same being the case with character constants. If so, I could
reduce the number of prototype declarations above and either pass " or
' and have the called function do the same thing. My other question is
what other ISO-conforming, syntactically-legal things can follow a /
besides an * . I don't believe that / is an answer to this question.
MPJ

This came up a while ago and I posted my solution. It is also
available on my site as:

<http://cbfalconer.home.att.net/download/uncmntc.zip>
 
C

CBFalconer

Merrill said:
Do you give permission to excerpt in clc? MPJ

I published it here before, and if you examine the source you will
see I have released it to the public domain. Enjoy.
 
M

Merrill & Michele

CBFalconer said:
I published it here before, and if you examine the source you will
see I have released it to the public domain. Enjoy.

What follows is uncmntc.c's function call to get rid of end of line comments
and then a couple questions.

/* gobble chars until EOLine or EOF. i.e. // comments */
static int eolcomment(void)
{
int ch, lastch;

ch = '\0';
do {
lastch = ch;
if (EOF == (ch = fgetc(stdin))) return EOF;
} while (!(('\n' == ch) && ('\\' != lastch)));
return ch;
} /* eolcomment */

Q1) This function is at file scope and preceded by the word static. EVERY
function at file scope I've looked at that has been written by an
experienced programmer does this. Can somebody tell me in what situation
one would NOT do this?

Q2) I'm having trouble with the second condition tested in the while
statement: ('\\' .... Is that testing for a literal backslask or is there
something about the EOL comment I don't know yet.
(I am aware that ISO has
// ... \
bla bla bla \
bla end bla
all three of the above lines necessarily commented out.)
 
C

Chris Croughton

Q1) This function is at file scope and preceded by the word static. EVERY
function at file scope I've looked at that has been written by an
experienced programmer does this. Can somebody tell me in what situation
one would NOT do this?

You evidently haven't looked at much code, because main() is a function
declared at file scope and doesn't have static! Have you ever looked at
code for a multi-file program?

The use of static in this sense means that the function is limited to
file scope. If you leave it off then it will be visible from other
compilation units, so if you have a function which is called from
another compilation unit (like main()) then it will not have the static
keyword. If you want to hide it from another compilation unit then use
static, if you want to reference a function in another compilation unit
use extern (the same goes for variables).
Q2) I'm having trouble with the second condition tested in the while
statement: ('\\' .... Is that testing for a literal backslask or is there
something about the EOL comment I don't know yet.
(I am aware that ISO has
// ... \
bla bla bla \
bla end bla
all three of the above lines necessarily commented out.)

Yes, '\\' is a backslash character, the code is saying "I've found a
newline, if it was immediately preceded by a backslash then don't treat
is as the end of the comment".

Chris C
 
M

Merrill & Michele

"Chris Croughton"


You evidently haven't looked at much code, because main() is a function
declared at file scope and doesn't have static! Have you ever looked at
code for a multi-file program?

Only in texts.
The use of static in this sense means that the function is limited to
file scope. If you leave it off then it will be visible from other
compilation units, so if you have a function which is called from
another compilation unit (like main()) then it will not have the static
keyword. If you want to hide it from another compilation unit then use
static, if you want to reference a function in another compilation unit
use extern (the same goes for variables).

I guess what I can't think of is when you would use it, unless you compiled
the units at different times and didn't have the source handy to reshuffle
the functions.
Yes, '\\' is a backslash character, the code is saying "I've found a
newline, if it was immediately preceded by a backslash then don't treat
is as the end of the comment".

And ISO stipulates that the only way to carry that eol comment on to the
next line is '\\' followed by '\n'? (No white space, e.g.) MPJ
 
T

Thomas Stegen

Merrill said:
I guess what I can't think of is when you would use it, unless you compiled
the units at different times and didn't have the source handy to reshuffle
the functions.


You should always use it unless you have reason not to. So the choice is
not when to use it, but when not to use it. Also is better to add it
and find out later that the function is probably better of without it
than the other way around.

Basically, all functions used internally in a module should be static
and all the functions which comprise the interface/API of the module
should not be static. This is similar to private and public
attributes in some mainstream OO languages like Java and C++.
 
C

CBFalconer

Merrill said:
What follows is uncmntc.c's function call to get rid of end of line
comments and then a couple questions.

/* gobble chars until EOLine or EOF. i.e. // comments */
static int eolcomment(void)
{
int ch, lastch;

ch = '\0';
do {
lastch = ch;
if (EOF == (ch = fgetc(stdin))) return EOF;
} while (!(('\n' == ch) && ('\\' != lastch)));
return ch;
} /* eolcomment */

Q1) This function is at file scope and preceded by the word static.
EVERY function at file scope I've looked at that has been written by
an experienced programmer does this. Can somebody tell me in what
situation one would NOT do this?

When one expects the function to be used (called) from outside that
compilation unit. The static word keeps the function name from
polluting the linker name space, so the name is freely available in
other modules/compilation units. In this case it makes little
difference, since there are no other compilation units, but it
means I can freely copy the entire function into something else if
I wish. And if I modify the module into something else, I have not
forgotten to isolate purely file local scope functions.
Q2) I'm having trouble with the second condition tested in the
while statement: ('\\' .... Is that testing for a literal
backslask or is there something about the EOL comment I don't
know yet. (I am aware that ISO has
// ... \
bla bla bla \
bla end bla
all three of the above lines necessarily commented out.)

It's not a while statement, it is a "do while" statement, which is
quite a different animal. Yes, it is testing for a literal
backslash, and thus for the sort of continuation line you
demonstrate above. Overall it simply ignores a '\n' immediately
preceded by a backslash.
 
K

Keith Thompson

Merrill & Michele said:
And ISO stipulates that the only way to carry that eol comment on to the
next line is '\\' followed by '\n'? (No white space, e.g.) MPJ

Yes, the backslash must be the last character on the line to be treated
as a continuation character.

This is a valid assignment statement (assuming an appropriate
definition for x):

x = \
42;


This is not:

x = \
42;

In my opinion, this is a flaw in the language; a backslash should be
treated as a continuation character even when followed by whitespace.
 
M

Merrill & Michele

"Keith Thompson"

Yes, the backslash must be the last character on the line to be treated
as a continuation character.

This is a valid assignment statement (assuming an appropriate
definition for x):

x = \
42;


This is not:

x = \
42;

In my opinion, this is a flaw in the language; a backslash should be
treated as a continuation character even when followed by whitespace.

We'll save the discussion for down the hallway. If I read you correctly,
'\\' must be followed immediately by '\n' to extend an eol comment. If
true, then no reply is requested. MPJ
 
H

Herbert Rosenau

/* gobble chars until EOLine or EOF. i.e. // comments */
static int eolcomment(void)
{
int ch, lastch;

ch = '\0';
do {
lastch = ch;
if (EOF == (ch = fgetc(stdin))) return EOF;
} while (!(('\n' == ch) && ('\\' != lastch)));
return ch;
} /* eolcomment */

Q1) This function is at file scope and preceded by the word static. EVERY
function at file scope I've looked at that has been written by an
experienced programmer does this. Can somebody tell me in what situation
one would NOT do this?

When the funcion can be called from outside the current translation
unit.
Q2) I'm having trouble with the second condition tested in the while
statement: ('\\' .... Is that testing for a literal backslask or is there
something about the EOL comment I don't know yet.

'\\' is a single backslash. As the backslach char is an escape char
one has to double it to get a single one to show the compiler that one
means the char itself, not the encoding that would occure by the next
one.

"\\" is a string containing at least one backslach char
"\\\\" a string containing two backslash chars"

'\n' is a single char that encoded means "end of line"
"\n" is the same char inside a string.
"\\n" is a string containing a single backslash char and a single
letter lower n.

So when you reads a C source char by char you would read
'\' - don't know yet if it is a backslash char or an encoded char.
You needs to read another char to decide what goes on.
When the next char is another '\' then there is nothing to do because
the 2 chars in common are an escaped escape char - meaning a single
'\'.

Interpreting a C source gives you a big problem: You has to decide and
to document if your program will understund only syntactically well
compilable code (relatively easy) or any crap a programmer may poduce
during typing the code (really complex), as this can hold any faulty
things, like unclosed strings like "string misses closed quotation
mark
'0 - a single char const - closed quotation mark missing
0xz1 - an illegal hex char
......
(I am aware that ISO has
// ... \
bla bla bla \
bla end bla
all three of the above lines necessarily commented out.)

Correct, yes.

//*.... is NOT a syntax error (longer token first: //, star is 1.
char in line comment!)
*/*... is end of block comment! (*/ followed by *)
*//*... end block commend followed by new block commend
*//... end block comment followed by division operator

// is C99 only - but legal extension of many C89 compilers!

"\"" a single quotation mark inside a string; one is escaped!
"/*" not a comment start
"*/" not a comment end
"//" not a line comment
"'" not a single single quotation mark
'"' not a single quotation mark

Don't get confused by:
\n readed as single char: new line in source
\n readed as \\+n (2 chars): symbolic new line. Compiler will
translate it to single char '\n'

When you have to eat syntax errors your job will be quite more
complex. The samples above are all legal - and may be still
incomplete.

Some hints:
Use a status variable that helps you to decide
- in plain text
- inside single line comment
- inside multiline comment
- inside string (no difference between initialiser, const string,
actual parameter....) string starts and ends with unquoted "
Your job is not to parse each and any token, but you have to hande
strings carfully to get not confused with strings looking like
comments and comments.

Hint: ungetc()/ungetchar() will put exactly ONE char given by you back
into the stream. That char can be any char, not only the last readed!
May help to identify multibyte tokens like /*, // and so on to get a
status change right.

Hint: you ran read EOF endless times without getting an error without
unget it.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!
 
M

Merrill & Michele

Michael Mair said:
/*
** Code doing the work.
*/
}
}

Does not cost much and is compilable :)



If you restrict yourself to the exercise, there are no C++/C99 line
comments (//).

Your cases have also to take care of things like not "leaving" a string
literal on \" but on \\", for example (expansion to/checking for other
cases left to you :)).

Hint: Introduce a variable indicating the state (outside/in comment/in
string/in character constant). (You seem to have planned doing that but
did not explicitly mention it.)

Coding took a different turn when it plopped down from on high. When I look
at Chuck's source, I am first of all presented with source I don't
understand. In such situations, I try to pretend like I'm the compiler. I
start by matching braces in main. I create a table:
open | close
1 3
2 4
3 5
4 7
6 8
7 2
8 1

The part you can't see is that I cross off the highest number on the left
when presented with a close brace and write the crossed-off number on the
right. That I end up with a one in the close column is a partial soln to
1-24. Now the question. Q) Given that clc has struggled lately with
topicality and that I've been part of the cause and part of the solution, I
would like to know whether a garden-variety compiler really goes about
business like this. I know darn well that a fella could write a compiler to
begin its business on a randomly-chosen byte; I'm trying to ramp up to 1-24,
nothing more. MPJ
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

K&R2, exercise 1-23 12
K&R2, exercise 4-2 12
Exercise 1-10 in K&R2 16
K&R2, exercise 5-4, strend(s,t) 15
K$R xrcise 1-13 (histogram) 4
K&R2 Ex1-14 3
K&R2, exercise 5.5 7
K&R2, 1.5.3, exercise 1-10 4

Members online

Forum statistics

Threads
474,164
Messages
2,570,899
Members
47,440
Latest member
YoungBorel

Latest Threads

Top