Odd preprocessor behaviour?

C

Chris Croughton

Is the following code standard-compliant, and if so what should it do?
And where in the standard defines the behaviour?

#include <stdio.h>

#define DEF defined XXX

int main(void)
{
int defined = 2;
#if ! DEF
#define XXX +
printf("%d\n", DEF 1);
#endif
return 0;
}

How about the following code?

#include <stdio.h>

#define CAT(a) defined a ## X

int main(void)
{
#if CAT(XX)
printf("XXX defined\n");
#endif
return 0;
}

As far as I can see the relative precedence of the join (##) operator
and the defined operator (in an #if) are not stated anywhere.

(Incidentally, using GCC 2.95.4 the first example works and parses the
'defined' in the macros as an operator in the #if and as an identifier
(variable) in the printf statement. The second fails with an error
(`defined' without an identifier). GCC 3.0 allows both but gives a
warning about using 'defined' during macro expansion -- but expands and
uses it, implementing the join operator before testing for 'XXX'
defined. I haven't tried other compilers yet...)

If a compiler (or preprocessor) were to say that using the 'defined'
operator during macro expansion is always an error, would it be breaking
the standard-compliance (and if so, where in the standard)? Is there a
difference between C89, C99 and C++ preprocessor behaviour in this?

Thanks,

Chris C
 
R

Robert Harris

Chris said:
Is the following code standard-compliant, and if so what should it do?
And where in the standard defines the behaviour?

#include <stdio.h>

#define DEF defined XXX

int main(void)
{
int defined = 2;
#if ! DEF
#define XXX +
printf("%d\n", DEF 1);
#endif
return 0;
}

How about the following code?

#include <stdio.h>

#define CAT(a) defined a ## X

int main(void)
{
#if CAT(XX)
printf("XXX defined\n");
#endif
return 0;
}

As far as I can see the relative precedence of the join (##) operator
and the defined operator (in an #if) are not stated anywhere.

(Incidentally, using GCC 2.95.4 the first example works and parses the
'defined' in the macros as an operator in the #if and as an identifier
(variable) in the printf statement. The second fails with an error
(`defined' without an identifier). GCC 3.0 allows both but gives a
warning about using 'defined' during macro expansion -- but expands and
uses it, implementing the join operator before testing for 'XXX'
defined. I haven't tried other compilers yet...)

If a compiler (or preprocessor) were to say that using the 'defined'
operator during macro expansion is always an error, would it be breaking
the standard-compliance (and if so, where in the standard)? Is there a
difference between C89, C99 and C++ preprocessor behaviour in this?
In section 6.10.1, the C99 standard says:

"Preprocessing directives of the forms

#if <constant-expression> new-line ...

Prior to evaluation, macro invocations in the list of preprocessing
tokens that will become the controlling constant expression are
replaced... If the token "defined" is generated as a result of this
replacement process ... the behaviour is undefined."

so in

#if CAT(XX)

you get undefined behaviour. And joins are evaluated before #if's. At
least that's my reading.

Robert
 
S

S.Tobias

Chris Croughton said:
Is the following code standard-compliant, and if so what should it do?
And where in the standard defines the behaviour?

n869.txt 6.10.1#3:
If the token `defined' is generated as a result of this replacement
process or use of the `defined' unary operator does not match one
of the two specified forms prior to macro replacement, the behavior
is undefined.
(I added the single quotes for better readability - in the Std this is
in Courier font.)
#include <stdio.h>
#define DEF defined XXX
int main(void)
{
int defined = 2;
#if ! DEF
UB.

#define XXX +
printf("%d\n", DEF 1);
#endif
return 0;
}
How about the following code?
#include <stdio.h>
#define CAT(a) defined a ## X
int main(void)
{
#if CAT(XX)

UB (for same reason).
printf("XXX defined\n");
#endif
return 0;
}
As far as I can see the relative precedence of the join (##) operator
and the defined operator (in an #if) are not stated anywhere.

In a "single preprocessing loop" `#' operator is applied first (during
the expansion); `##' is resolved at the last step (just before
the result of partial expansion is subjected to further expansion
again - see "Rescanning...").

This way they cannot coexist in a single "preprocessing expression",
because string tokens (or string token and something-else) don't form
a single valid token when pasted:
/*BAD CODE*/
#define DOUBLE_STR(x) # x ## # x
DOUBLE_STR(abc)
won't work, because
"abc""abc"
is not a valid pp-token (they are, actually, two valid pp-tokens, and would
be merged into a single C token "abcabc" before proper code translation).
(Incidentally, using GCC 2.95.4 the first example works and parses the
'defined' in the macros as an operator in the #if and as an identifier
(variable) in the printf statement. The second fails with an error
(`defined' without an identifier). GCC 3.0 allows both but gives a
warning about using 'defined' during macro expansion -- but expands and
uses it, implementing the join operator before testing for 'XXX'
defined. I haven't tried other compilers yet...)

All of them are "correct" in sense that they don't break the Standard.
As you have learned, you can't count on anything here, though.
If a compiler (or preprocessor) were to say that using the 'defined'
operator during macro expansion is always an error, would it be breaking
the standard-compliance (and if so, where in the standard)?

Of course not! Because it's just undefined - anything is allowed.
Is there a
difference between C89, C99 and C++ preprocessor behaviour in this?

No.
 
C

Chris Croughton

In section 6.10.1, the C99 standard says:

"Preprocessing directives of the forms

#if <constant-expression> new-line ...

Prior to evaluation, macro invocations in the list of preprocessing
tokens that will become the controlling constant expression are
replaced... If the token "defined" is generated as a result of this
replacement process ... the behaviour is undefined."

Ah, I think I read that as the token 'defined' being generated by using
## for instance (def##ined), but I see that it does include "generated
by the expansion of a macro". That makes sense.
so in

#if CAT(XX)

you get undefined behaviour. And joins are evaluated before #if's. At
least that's my reading.

So if it's undefined, my preprocessing program can do whatever it wants
and no one can complain about it because they shouldn't be doing it
anyway. Not that my program was supposed to be a full preprocessor
originally, it's just getting that way...

Thanks,

Chris C
 
C

Chris Croughton

n869.txt 6.10.1#3:
If the token `defined' is generated as a result of this replacement
process or use of the `defined' unary operator does not match one
of the two specified forms prior to macro replacement, the behavior
is undefined.
(I added the single quotes for better readability - in the Std this is
in Courier font.)

Yes, I see, I was misreading the term 'generated' (I was thinking of it
being something like def##ined, rather than a token within a macro).
In a "single preprocessing loop" `#' operator is applied first (during
the expansion); `##' is resolved at the last step (just before
the result of partial expansion is subjected to further expansion
again - see "Rescanning...").

This way they cannot coexist in a single "preprocessing expression",
because string tokens (or string token and something-else) don't form
a single valid token when pasted:
/*BAD CODE*/
#define DOUBLE_STR(x) # x ## # x
DOUBLE_STR(abc)
won't work, because
"abc""abc"
is not a valid pp-token (they are, actually, two valid pp-tokens, and would
be merged into a single C token "abcabc" before proper code translation).

Ah, I see the difference, it's a valid sequence of characters only at
the lexical level, but not as a single token.
All of them are "correct" in sense that they don't break the Standard.
As you have learned, you can't count on anything here, though.

Indeed. However, for my purpose (writing a preprocessor) it is good
because it means that whatever I do will not be wrong, since if anyone
uses the construct they can't depend on it being portable anyway.
Of course not! Because it's just undefined - anything is allowed.

And it doesn't even have to be documented (unlike implementation-defined
behaviour).

That's a relief, I really did not want to get into multiple switches to
handle different standards...

Thanks,

Chris C
 
S

S.Tobias

Chris Croughton said:
On 22 Nov 2004 23:52:56 GMT, S.Tobias
<[email protected]> wrote:
Indeed. However, for my purpose (writing a preprocessor) it is good

Be warned: writing C preprocessor is not an especially easy task!
If you need a stand-alone pp, `mcpp' is said to be good.
If you need a cold shower, have a look at ::boost::preprocessor.

That's a relief, I really did not want to get into multiple switches to
handle different standards...

Hey, not so fast! I answered your question on differences "in this",
nothing else. If you're writing an include-all preprocessor, you'll
have to have some.

I don't think there are major differences in expansion algorithm between
those standards, but there are differences nevertheless. C99 added a
few things to the preprocessor, the most prominent being the ellipsis.
I don't think there is big change from C89 to C++, but there are slight
differences in the text from the beginning on; I think this might be related
to new keywords (arithmetic operators) in C++; it might be worth asking
in C++ NG what those differences are.

Last but not least, there are bound to be differences between
implementations of the same standard. C preprocessor is part of the
compiler and expression evaluation in the #if directive depends on how
certain types are implemented in the language (some people say here "C
preprocessor doesn't know C" - I think this is not the whole story).
For example, in C89:
#if 0xffffffff + 1
will depend on the value of LONG_MAX.
 
C

Chris Croughton

Be warned: writing C preprocessor is not an especially easy task!
If you need a stand-alone pp, `mcpp' is said to be good.
If you need a cold shower, have a look at ::boost::preprocessor.

See my other recent thread, I'm writing a pre-preprocessor, as in the
idea of scpp and rmif, to process out 'known' #ifs and defines (and
known undefined symbols). It doesn't have to do everything (indeed, it
already does more than I need it to do, things like the join operator
are extra but if I'm going to add them I may as well get them right).
It doesn't deal with any preprocessor lines apart from #if #elif #else
#endif, #define and #undef.
Hey, not so fast! I answered your question on differences "in this",
nothing else. If you're writing an include-all preprocessor, you'll
have to have some.

Ah yes, I do know about others.
I don't think there are major differences in expansion algorithm between
those standards, but there are differences nevertheless. C99 added a
few things to the preprocessor, the most prominent being the ellipsis.

Oh yes, I've already decided that I'm not going to do the varargs part
(among other things because the GCC syntax for it is a de facto standard
for a large chunk of code and it's not compatible with the C99 syntax).

Incidentally, is it possible to get a copy of the C89 standard now?
I've found several things where I want to be maximally compatible but I
only have the C99 standard.
I don't think there is big change from C89 to C++, but there are slight
differences in the text from the beginning on; I think this might be related
to new keywords (arithmetic operators) in C++; it might be worth asking
in C++ NG what those differences are.

Comparing them is a pain, grep notices differences in line lengths and
word wrapping.
Last but not least, there are bound to be differences between
implementations of the same standard. C preprocessor is part of the
compiler and expression evaluation in the #if directive depends on how
certain types are implemented in the language (some people say here "C
preprocessor doesn't know C" - I think this is not the whole story).

Indeed it isn't. The C preprocessor doesn't need to know anything about
the compiler for C code (indeed, C preprocessors are frequently used for
assembler preprocessing), but in most cases it does know something about
it (common implementation of integer operators, for instance).
For example, in C89:
#if 0xffffffff + 1
will depend on the value of LONG_MAX.

Or in C99 on the value of INTMAX_MAX. My program tries to work out the
"largest integer type", but that's one area where I'm knowingly not
conforming because I'm not distinguishing between signed and unsigned
values (I treat them all as signed).

Chris C
 
S

S.Tobias

Chris Croughton said:
On 23 Nov 2004 15:25:51 GMT, S.Tobias
<[email protected]> wrote:
Incidentally, is it possible to get a copy of the C89 standard now?

I suppose a simple answer whould be: no, unless you're rich.
This has been discussed in c.s.c., so check for answers there.

To have access to C89 text I bought Schildt's book on Amazon, and
keep my right I shut when I read it (but that might not be enough,
http://www.lysator.liu.se/c/schildt.html#6-1-3-1).

For practical purposes I prefer to work with
http://danpop.home.cern.ch/danpop/ansi.c
 
C

Chris Croughton

I suppose a simple answer whould be: no, unless you're rich.
This has been discussed in c.s.c., so check for answers there.

To have access to C89 text I bought Schildt's book on Amazon, and
keep my right I shut when I read it (but that might not be enough,
http://www.lysator.liu.se/c/schildt.html#6-1-3-1).

Yes, getting the annotations wrong is one thing but getting the
specification itself wrong... One of the pages is also missing (or
rather one is duplicated instead of the page which should have followed
it).

I like Clive's comments (there and in other places)...
For practical purposes I prefer to work with
http://danpop.home.cern.ch/danpop/ansi.c

<innocent>
I downloaded that ansi.c file and tried to compile it, but GCC reported
lots of errors...
</innocent>

Thanks, that's a very useful link. I have somewhere a marked-up draft
from the Working Committee (a cow-orker was a commentator), but several
things changed between then and the ANSi standard being published...

Chris C
 
C

Chris Croughton

Last but not least, there are bound to be differences between
implementations of the same standard. C preprocessor is part of the
compiler and expression evaluation in the #if directive depends on how
certain types are implemented in the language (some people say here "C
preprocessor doesn't know C" - I think this is not the whole story).
For example, in C89:
#if 0xffffffff + 1
will depend on the value of LONG_MAX.

There's another one which comes out of your description of making
'illegal' tokens.

#define CAT(a,b) a ## b

char *p = CAT("a", "b");
int i = CAT(1 + 8, / 2);

makes the invalid preprocessing tokens "a""b" and 8/ when expanded.
However, if the preprocessor is run separately from the compiler the
expansion is the perfectly valid code

char *p = "a""b";
int i = 1 + 8/ 2;

(GCC gives warnings about "invalid preprocessor token", but then goes on
to compile the code happily doing what one would expect. But it's UB, a
compiler could fail or produce complete rubbish if it wanted.)

Gah. What was that saying about a mouse designed by a committee? (It
could be worse. Perl /is/ worse...) <g>

Chris C
 
D

dandelion

Chris Croughton said:
Gah. What was that saying about a mouse designed by a committee?

Thay say a camel is a horse designed by commitee...
(It could be worse. Perl /is/ worse...) <g>

I'd love to hear Dijkstra's comments on that language.
 
C

Chris Croughton

Thay say a camel is a horse designed by commitee...

I think it was an elephant which was a mouse designed by committee. Or
by IBM said:
I'd love to hear Dijkstra's comments on that language.

Only from a safe distance!

Chris C
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top