Keith Thompson said:
Tim Rentsch said:
Keith Thompson said:
[ considering cases, eg, #define X "x.h" / #include X ]
I think there just needs to be an additional statement that there's an
implementation-defined mapping of a string literal to header name
preprocessing token. I would hope that this mapping would be required
to be the same as the mapping for an explicit #include "filename"
directive, so that this:
#include "filename"
and this:
#define HEADER "filename"
#include HEADER
are required to be equivalent.
As I explained in my last response, the Standard doesn't require a
header-name preprocessing token in a #include directive. A string
literal (that doesn't have a " in it) preprocessing token matches
the form described in 6.10.2p3, and no additional language is
needed to define what happens in such cases. Arguably the existing
text should be clarified, but at worst it's just unclear, not
incomplete.
Ok, but I think the existing text makes some implicit assumptions.
I would say this differently, which is it doesn't express itself
as clearly as it could (and arguably should). But let's read on.
Let's see what happens in the course of the translation phases for the
following translation unit:
#define HEADER "foo.h"
#include HEADER
What happens in phases 1 and 2 isn't relevant to the current
discussion.
Phase 3 decomposes the translation unit into preprocessing tokens
and whitespace. The resulting preprocessing tokens are:
#
define
HEADER
"foo.h"
#
include
HEADER
where HEADER is an identifier and "foo.h" is a string literal. The
new-line after "foo.h" is retained.
You forgot to say 'define' and 'include' are identifiers. They
are not preprocessing keywords (as there are no such things).
In phase 4, preprocessing directives are executed. The execution of
the #define directive causes the macro HEADER to be defined. The #include
directive is processed in accordance with C99 6.10.2p4:
A preprocessing directive of the form
# include pp-tokens new-line
(that does not match one of the two previous forms) is
permitted. The preprocessing tokens after include in the
directive are processed just as in normal text. (Each identifier
currently defined as a macro name is replaced by its replacement
list of preprocessing tokens.) The directive resulting after
all replacements shall match one of the two previous forms.148)
The method by which a sequence of preprocessing tokens between
a < and a > preprocessing token pair or a pair of " characters
is combined into a single header name preprocessing token is
implementation-defined.
Footnote 148 (numbering is from N1256):
Note that adjacent string literals are not concatenated into a
single string literal (see the translation phases in 5.1.1.2);
thus, an expansion that results in two string literals is an
invalid directive.
HEADER is expanded to a single pp-token, the string literal "foo.h",
so the directive matches the specified form. But for this to be valid,
and equivalent to
#include "foo.h"
the single pp-token "foo.h" (a string literal) must "match"
"q-char-sequence"
It does "match" it in the sense that it's composed of the same
sequence of characters. I just find it slightly odd that, one
phase after the sequence of 7 characters "foo.h" has been converted
to a single pp-token, it apparently has to be converted *back* to
a sequence of 7 characters so it can be matched (or not) against
"q-char-sequence". And of course some string literals don't match
"q-char-sequence" (such as "foo\"bar"), and others that do match
might not or might not have the same meaning (such as "dir\foo.h").
Exactly the same kind of argument could be made about 'include',
since it's an identifier (or even just a preprocessing-token). I
think the problem with this argument is that it makes presumptions
that aren't consistent with how the Standard explains what goes
on during preprocessing. The discussion of what happens in the
preprocessing phase generally is much less particular about
specific syntactic categories. In most cases (there are a few
exceptions, but only a few), any syntactic category more specific
than preprocessing-token is never identified. So, when I see text
like
A preprocessing directive of the form
# include "q-char-sequence" new-line
I take this to mean: three preprocessing-tokens, followed by
new-line; the first preprocessing-token (of whatever category)
must be the character '#'; the second preprocessing-token (of
whatever category) must be the string 'include'; the third
preprocessing-token (of whatever category) must match the lexical
form '"q-char-sequence"'. Now it happens that a punctuator is
the only form of preprocessing-token that matches the first
string, and an identifier is the only form of preprocessing-token
that matches the second string, but the Standard never actually
says that these categories are the only acceptable ones. The
syntax given for #include (under control-line
is
# include pp-tokens new-line
despite 'include' not being identified as some sort of keyword or
syntactic category. We understand what that means by analogy with
keywords in later processing stages, but the Standard never says
(as least not as far as I know) in just what category this
preprocessing-token must be (nor does it in most other cases, as I
mentioned above). This non-specificity explains why the text in
6.10.2p2 and 6.10.2p3 may be written the way it is -- these
paragraphs are not expressing syntax, but rather are expressing
conditions for matching whatever preprocessing tokens are there in
the respective cases.
Yes, this is nitpicking, and the intent (at least in most cases)
is crystal clear: the string literal that results from the macro
expansion is reinterpreted as "q-char-sequence". It's just that
the extra step of going back from a string literal to a sequence
of characters is implicitly assumed rather than stated.
I don't think it has to be 'reinterpreted', to use your word,
because the given form is not a syntax rule. Rather the
'"q-char-sequence"' item is expressing a _condition_ that
characters in the preprocessing-token (whatever category it might
be) must satisfy -- just the same as using 'include' to give a
condition that characters in the second preprocessing-token must
satisfy.