/* gobble chars until EOLine or EOF. i.e. // comments */
static int eolcomment(void)
{
int ch, lastch;
ch = '\0';
do {
lastch = ch;
if (EOF == (ch = fgetc(stdin))) return EOF;
} while (!(('\n' == ch) && ('\\' != lastch)));
return ch;
} /* eolcomment */
Q1) This function is at file scope and preceded by the word static. EVERY
function at file scope I've looked at that has been written by an
experienced programmer does this. Can somebody tell me in what situation
one would NOT do this?
When the funcion can be called from outside the current translation
unit.
Q2) I'm having trouble with the second condition tested in the while
statement: ('\\' .... Is that testing for a literal backslask or is there
something about the EOL comment I don't know yet.
'\\' is a single backslash. As the backslach char is an escape char
one has to double it to get a single one to show the compiler that one
means the char itself, not the encoding that would occure by the next
one.
"\\" is a string containing at least one backslach char
"\\\\" a string containing two backslash chars"
'\n' is a single char that encoded means "end of line"
"\n" is the same char inside a string.
"\\n" is a string containing a single backslash char and a single
letter lower n.
So when you reads a C source char by char you would read
'\' - don't know yet if it is a backslash char or an encoded char.
You needs to read another char to decide what goes on.
When the next char is another '\' then there is nothing to do because
the 2 chars in common are an escaped escape char - meaning a single
'\'.
Interpreting a C source gives you a big problem: You has to decide and
to document if your program will understund only syntactically well
compilable code (relatively easy) or any crap a programmer may poduce
during typing the code (really complex), as this can hold any faulty
things, like unclosed strings like "string misses closed quotation
mark
'0 - a single char const - closed quotation mark missing
0xz1 - an illegal hex char
......
(I am aware that ISO has
// ... \
bla bla bla \
bla end bla
all three of the above lines necessarily commented out.)
Correct, yes.
//*.... is NOT a syntax error (longer token first: //, star is 1.
char in line comment!)
*/*... is end of block comment! (*/ followed by *)
*//*... end block commend followed by new block commend
*//... end block comment followed by division operator
// is C99 only - but legal extension of many C89 compilers!
"\"" a single quotation mark inside a string; one is escaped!
"/*" not a comment start
"*/" not a comment end
"//" not a line comment
"'" not a single single quotation mark
'"' not a single quotation mark
Don't get confused by:
\n readed as single char: new line in source
\n readed as \\+n (2 chars): symbolic new line. Compiler will
translate it to single char '\n'
When you have to eat syntax errors your job will be quite more
complex. The samples above are all legal - and may be still
incomplete.
Some hints:
Use a status variable that helps you to decide
- in plain text
- inside single line comment
- inside multiline comment
- inside string (no difference between initialiser, const string,
actual parameter....) string starts and ends with unquoted "
Your job is not to parse each and any token, but you have to hande
strings carfully to get not confused with strings looking like
comments and comments.
Hint: ungetc()/ungetchar() will put exactly ONE char given by you back
into the stream. That char can be any char, not only the last readed!
May help to identify multibyte tokens like /*, // and so on to get a
status change right.
Hint: you ran read EOF endless times without getting an error without
unget it.
--
Tschau/Bye
Herbert
Visit
http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!