Preprocessor issue - token spacing

M

Mamluk Caliph

For quite some time I've used code as the following to be able to
"cut&paste" parts from different headerfiles of the same name. It has
worked with GCC, MS, Borland, Keil to name a few.

In principle the mechanism looks like this:

#define CHAINPATH /usr/include
#define DEFSTR( x ) \
#x

#define FNAME( path, file ) \
DEFSTR( path/file )

#define BUILDCHAIN( file ) \
FNAME( CHAINPATH, file )


#include BUILDCHAIN( stdio.h )

int main(int argc, char **argv){
printf("Hello world\n");
}

For some reason, when I try to compile this with CodeWarrior an extra
space will be inserted between the path and the filename.

I'm not sure this is a bug and I would be grateful to know if there is
any right or wrong concerning token spacing, resulting in a "file not
found" error. When it comes to the preprocessor I'm usually confused
so it might be me missing something really obvious.

Also if somebody has another suggestion for handling multiple headers
of the same name that would be welcome too.

/Michael
 
H

Harald van Dijk

For quite some time I've used code as the following to be able to
"cut&paste" parts from different headerfiles of the same name. It has
worked with GCC, MS, Borland, Keil to name a few.

In principle the mechanism looks like this:

#define CHAINPATH /usr/include
#define DEFSTR( x ) \
#x

#define FNAME( path, file ) \
DEFSTR( path/file )

#define BUILDCHAIN( file ) \
FNAME( CHAINPATH, file )


#include BUILDCHAIN( stdio.h )

int main(int argc, char **argv){
printf("Hello world\n");
}

For some reason, when I try to compile this with CodeWarrior an extra
space will be inserted between the path and the filename.

I'm not sure this is a bug and I would be grateful to know if there is
any right or wrong concerning token spacing, resulting in a "file not
found" error. When it comes to the preprocessor I'm usually confused so
it might be me missing something really obvious.

The rules for #include are very lenient, and the details of how other
tokens than "..." and <...> map to those two forms are left mostly to the
implementation.

However, stringizing itself is well specified, and it's possible to test
whether that's the problem you're having by writing a program to test
only that:

#define CHAINPATH /usr/include
#define DEFSTR( x ) \
#x

#define FNAME( path, file ) \
DEFSTR( path/file )

#define BUILDCHAIN( file ) \
FNAME( CHAINPATH, file )

#include <stdio.h>

int main(int argc, char **argv){
puts(BUILDCHAIN( stdio.h ));
}

This is required to print "/usr/include/stdio.h", and if CodeWarrior
inserts a space in this case as well, I believe it has a bug. Whitespace
surrounding macro arguments or macro definitions are supposed to be
ignored, so the only relevant spacing would be between path and /, or
between / and file. There isn't spacing in either case, so there should
not be any spacing in the resulting string.
Also if somebody has another suggestion for handling multiple headers of
the same name that would be welcome too.

Generally speaking, it would be good to simply completely avoid this. If
you absolutely need this, could you explain why? There are some possible
better ways of doing this, but they won't work in all cases, so depending
on why you need this, they may or may not apply.
 
K

Keith Thompson

Mamluk Caliph said:
For quite some time I've used code as the following to be able to
"cut&paste" parts from different headerfiles of the same name. It has
worked with GCC, MS, Borland, Keil to name a few.

In principle the mechanism looks like this:

#define CHAINPATH /usr/include
#define DEFSTR( x ) \
#x

#define FNAME( path, file ) \
DEFSTR( path/file )

#define BUILDCHAIN( file ) \
FNAME( CHAINPATH, file )


#include BUILDCHAIN( stdio.h )

int main(int argc, char **argv){
printf("Hello world\n");
}

For some reason, when I try to compile this with CodeWarrior an extra
space will be inserted between the path and the filename.
[...]

Macro definitions are defined in terms of sequences of (preprocessing)
tokens. Your first definition:

#define CHAINPATH /usr/include

defines CHAINPATH as a sequence of 4 distinct tokens:

/ usr / include

Normally I'd suggest something like this:

#define CHAINPATH "/usr/include"

and using string literal concatenation to build the final
"/usr/include/stdio.h" string literal, but I don't think concatenation
applies to header names (which look like string literals, but really
aren't) -- and a quick experiment with gcc shows that it accepts this:
#include "/usr/include/stdio.h"
but not this:
#include "/usr/include/" "stdio.h"

You might have to resort to a custom-built preprocessor that you
invoke to generate your source files during your build procedure.
 
T

Thad Smith

Keith said:
Normally I'd suggest something like this:

#define CHAINPATH "/usr/include"

and using string literal concatenation to build the final
"/usr/include/stdio.h" string literal, but I don't think concatenation
applies to header names (which look like string literals, but really
aren't)

That's correct. String literal concatenation occurs in translation
phase 6, too late. There is a footnote to that effect in section 6.10.2.
 
M

Mamluk Caliph

The rules for #include are very lenient, and the details of how other
tokens than "..." and <...> map to those two forms are left mostly to the
implementation.

However, stringizing itself is well specified, and it's possible to test
whether that's the problem you're having by writing a program to test
only that:

#define CHAINPATH /usr/include
#define DEFSTR( x ) \
#x

#define FNAME( path, file ) \
DEFSTR( path/file )

#define BUILDCHAIN( file ) \
FNAME( CHAINPATH, file )

#include <stdio.h>

int main(int argc, char **argv){
puts(BUILDCHAIN( stdio.h ));

}

This is required to print "/usr/include/stdio.h", and if CodeWarrior
inserts a space in this case as well, I believe it has a bug. Whitespace
surrounding macro arguments or macro definitions are supposed to be
ignored, so the only relevant spacing would be between path and /, or
between / and file. There isn't spacing in either case, so there should
not be any spacing in the resulting string.

You're right! It's not stringizing that's the problem and the extra
space is not there in the modified program.
Generally speaking, it would be good to simply completely avoid this. If
you absolutely need this, could you explain why? There are some possible
better ways of doing this, but they won't work in all cases, so depending
on why you need this, they may or may not apply.

Yes, and I think I've experienced the reason "why" many times over.
It's not something that I enjoy doing and I consider it breaking
almost every rule I know regarding good programming practice.

The reason I do this however, is because I work with embedded
applications for small to puny targets and in the embedded world, tool-
chains often come with crippled or incomplete standard libraries. Even
as such, they provide quite a lot of value and rewriting them just to
get it right is almost always too much work. So what I do is adding
what's missing (usually functions) and "merging" the original header
files with the additional declarations.

Sometimes I also need to redefine a function or macro because it's
either wrong (the need for reentrancy sometimes forces me to re-
implement certain functions with my own versions) or it doesn't fit
the target for some other reason. The latter is *really* messy and
neither habit is something I would recommend. I've just not figured
out another way so far.
 
E

Eric Sosman

Mamluk said:
[...]
Also if somebody has another suggestion for handling multiple headers of
the same name that would be welcome too.
Generally speaking, it would be good to simply completely avoid this. If
you absolutely need this, could you explain why? There are some possible
better ways of doing this, but they won't work in all cases, so depending
on why you need this, they may or may not apply.

Yes, and I think I've experienced the reason "why" many times over.
It's not something that I enjoy doing and I consider it breaking
almost every rule I know regarding good programming practice.

The reason I do this however, is because I work with embedded
applications for small to puny targets and in the embedded world, tool-
chains often come with crippled or incomplete standard libraries. Even
as such, they provide quite a lot of value and rewriting them just to
get it right is almost always too much work. So what I do is adding
what's missing (usually functions) and "merging" the original header
files with the additional declarations.

Unless I've missed something, the obvious approach is to
give your modified headers distinct names: "mcstdio.h" or
something of the kind, and #include them via those names.
The content of one of these might look something like

#ifndef MCSTDIO_H
#define MCSTDIO_H

#ifdef TINYCHIP
/* Vendor's <stdio.h> is almost perfect, but
* I need to do something sneaky with stderr
*/
#include <stdio.h>
extern FILE * mc_get_stderr_substitute(void);
#undef stderr
#define stderr mc_get_stderr_substitute()
#endif

#ifdef MEGACHIP
/* For once, a vendor's <stdio.h> is fine */
#include <stdio.h>
#endif

#ifdef CHIPOFFTHEOLDBLOCK
/* Vendor's <stdio.h> is completely hopeless;
* implement my own substitute here
*/
#endif

#endif /* MDSTDIO_H */

In other words, handle the variations between platforms
within the headers themselves, instead of by #include'ing
variants of the headers. If the prospect of merging many
modified headers into a single file is daunting, use one
more level of indirection:

#ifndef MCSTDIO_H
#define MCSTDIO_H

#ifdef TINYCHIP
#include "/tools/tinychip/tiny_stdio.h"
#endif

#ifdef MEGACHIP
#include <stdio.h>
#endif

#ifdef CHIPOFFTHEOLDBLOCK
#include "/tools/tinychip/oldblock_stdio.h"
#endif

#endif /* MCSTDIO_H */
 
M

Mamluk Caliph

Mamluk said:
]
Also if somebody has another suggestion for handling multiple headers of
the same name that would be welcome too.
Generally speaking, it would be good to simply completely avoid this. If
you absolutely need this, could you explain why? There are some possible
better ways of doing this, but they won't work in all cases, so depending
on why you need this, they may or may not apply.
Yes, and I think I've experienced the reason "why" many times over.
It's not something that I enjoy doing and I consider it breaking
almost every rule I know regarding good programming practice.
The reason I do this however, is because I work with embedded
applications for small to puny targets and in the embedded world, tool-
chains often come with crippled or incomplete standard libraries. Even
as such, they provide quite a lot of value and rewriting them just to
get it right is almost always too much work. So what I do is adding
what's missing (usually functions) and "merging" the original header
files with the additional declarations.

Unless I've missed something, the obvious approach is to
give your modified headers distinct names: "mcstdio.h" or
something of the kind, and #include them via those names.
The content of one of these might look something like

#ifndef MCSTDIO_H
#define MCSTDIO_H

#ifdef TINYCHIP
/* Vendor's <stdio.h> is almost perfect, but
* I need to do something sneaky with stderr
*/
#include <stdio.h>
extern FILE * mc_get_stderr_substitute(void);
#undef stderr
#define stderr mc_get_stderr_substitute()
#endif

#ifdef MEGACHIP
/* For once, a vendor's <stdio.h> is fine */
#include <stdio.h>
#endif

#ifdef CHIPOFFTHEOLDBLOCK
/* Vendor's <stdio.h> is completely hopeless;
* implement my own substitute here
*/
#endif

#endif /* MDSTDIO_H */

In other words, handle the variations between platforms
within the headers themselves, instead of by #include'ing
variants of the headers. If the prospect of merging many
modified headers into a single file is daunting, use one
more level of indirection:

#ifndef MCSTDIO_H
#define MCSTDIO_H

#ifdef TINYCHIP
#include "/tools/tinychip/tiny_stdio.h"
#endif

#ifdef MEGACHIP
#include <stdio.h>
#endif

#ifdef CHIPOFFTHEOLDBLOCK
#include "/tools/tinychip/oldblock_stdio.h"
#endif

#endif /* MCSTDIO_H */

This is actually very close to how different targets and vendor are
handled today, except for the name of the headerfiles themselves i.e.

For smaller (as in fewer lines of code) applications this solution
would work well. In cases where one integrates other peoples code, or
complete external projects for that matter, it would enforce a
modification in each source (small, but never the less). Depending on
how many files is concerned and how these are managed, this could be
more ore less difficult.

I would not mind having the headerfiles named distinctly, but
enforcing other teamembers to use them would most likely be cumbersome
and error-prone.

So what I do is providing an alternative set of headerfiles in a
separate directory structure and then just modify the buildsystem (one
common point of management) is needed so that the modified ones are
found before any vendor provided ones. Actually quite few files are
needed, so I could do without the header filename mechanism as well
and haredcode the include paths in the sources.

However, it would be even better if the mechanism would work since it
makes it possible to handle different buildhost installations more
easily. The CHAINPATH variable is btw not hardcoded in reality, but
provided externally by the build system.

In essence my issue can be summarized in the following: "How to
include something based on a macro"

For example, this works in most compilers I've tried:

#define ANAME "stdio.h"
#include ANAME

Why I can't combine paths with filenames is beyond my understanding
though...

/Michael
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,740
Latest member
AdolphBig6

Latest Threads

Top