ruminations on #include and paths

D

David Mathog

One thing that can make porting C code from one
platform to another miserable is #include. In particular,
the need to either place the path to an included file within
the #include statement or to very carefully define the
order in which paths are searched with command line options
on the compiler. Both can cause problems, especially when
dealing with complex software distributions.

It occurs ot me that by extending the C include syntax
slightly it would be possible to clarify matters somewhat.
For instance, consider replacing this:

#include "foo/bar.h"

and this

#include <bar.h> /* same file, but path from compiler */

with this:

#include <bar.h> <foo>

Where the code now says explicitly that the include bar.h is from
a package known as foo. Note, no paths in the include - the
include file is specified by WHAT it is and not by WHERE it is.
To support this the language needs only a minor extension - a
way to map a text file containing the list of packages to
their locations so that the compiler can pull in the right file.
Such a map might look something like:

/* Enable the use of alternative include paths for packages,
for instance on development systems. This would not be
common. */
#if USEALT
#define TOP "/usr/include" /* absolute explicit path */
#else
#define TOP "/usr/altinclude" /* absolute explicit path */
#endif

/*This would be the most common way to associate
a path with a package */
#package foo "/usr/include/foo" /* absolute explict path */

/* or alternatively, a relative explicit path definition, note
substitution within angle brackets, so some other syntax might
be better:

#package foo <TOP/foo>
*/

/* no path means that the path to the package must have been previously
defined or it is an error. */
#package X11


I see several advantages for this extension.

1. If the package doesn't exist the compiler would emit
an error message like:

C compiler error: compiler doesn't know #include package: foo

which tells you immediately that the system either doesn't have foo or
that the map file does not include it. Ideally as packages are
added to a system the default map would be automatically updated, so
that the meaning of this message would be 99.9% of the time that
the package wasn't present on the system. As opposed to:

C compiler error: could not include bar.h

which generally means you have to start spelunking in the code and
looking carefully at the command line to see exactly what's wrong.
Note that if the package exists but the include doesn't, as might happen
for a botched install or a version mismatch, the error message might
be:

C compiler error: found package but not include file

2. A complex package would contain its own map file which
could look something like:

#package X11
#package GL
#define THISPACKAGE "."
#package foo <THISPACKAGE/foo>
#package woo <THISPACKAGE/woo>

where the last three lines define paths to package locations within
the software distribution, and the first two define packages that
are required, but default to the default system map or alternatively
specified compile line map. This would be good because
at least ideally the very first compile could detect the absence of
required packages and stop at that point. Currently on a large
software project one might see hundreds of compiled modules and then
it stops when it can't find some include file. Moreover, one could
just look at this map file to determine if all required packages
are present on the system, and it would be trivial to write a tool
to carry out such a test automatically as part of a "make", for
instance.

3. Allows the unambiguous specification WITHIN THE CODE of the
intended file to be included, even if several packages have include
files with exactly the same name. This specification would survive
any reordering of the directory structure of the code modules.
(Obviously rearranging the include files in their directories would
break it!)

4. The extended syntax is backwards compatible with
existing #include. If the package part was omitted the compiler
would employ the existing #include methods.

Just a thought,

David Mathog
(e-mail address removed)
 
G

Gordon Burditt

#include said:
Where the code now says explicitly that the include bar.h is from
a package known as foo. Note, no paths in the include - the
include file is specified by WHAT it is and not by WHERE it is.
To support this the language needs only a minor extension - a
way to map a text file containing the list of packages to
their locations so that the compiler can pull in the right file.

This sounds like an absolute nightmare to maintain, for a lot
of reasons. It might work reasonably well if nobody using the
compiler is allowed to write their own code, and everyone used
package installers, but that sort of defeats the purpose of a
compiler, doesn't it?

1. You need a global package naming registry. Yes, I really mean
*GLOBAL*, if not intergalactic. And how many compilers support
use of Chinese character sets in package names?
2. Developers of code that is logically divided up need to get a
package name from the package naming registry BEFORE THEY CAN
CODE ANYTHING THAT USES IT.
3. You have a file that NEEDS to be edited by all those developing
code, but one screwup can cause everyone's code to quit working,
which begs for it to be writable by the administrator only.
There's also the issue of how you keep it under source code control
when pieces of the file each belong to different packages under
development.
4. There is no obvious way to merge these files from various packages,
or to delete entries if you delete a package. Oh, yes, there's
also the issue of package versions if the include file structure
of the package radically changes (and there are examples of where
it has in real life, and often you need *both* versions installed
as not everything that uses the package has modified code to use the
new version).
5. This setup if used for self-reference (e.g. the foo package references
its own include files with #include <foo> <bar.h>) makes absolutely
sure you can't have a production and a test version of the same package
that can co-exist.
Such a map might look something like:

/* Enable the use of alternative include paths for packages,
for instance on development systems. This would not be
common. */
#if USEALT

Where, pray tell, is USEALT supposed to be defined or not
defined? If it's supposed to be in the code being compiled
or its command line, that's worse than hardcoding "/usr/include"
or "/usr/altinclude"; you also have to know the name of the
#if to be used. This is a lot worse than just using compiler
defaults and no better than trying to specify -I/usr/include,
an option which lots of compilers accept.
#define TOP "/usr/include" /* absolute explicit path */
#else
#define TOP "/usr/altinclude" /* absolute explicit path */
#endif

/*This would be the most common way to associate
a path with a package */
#package foo "/usr/include/foo" /* absolute explict path */

/* or alternatively, a relative explicit path definition, note
substitution within angle brackets, so some other syntax might
be better:

#package foo <TOP/foo>

This works great until another package adds stuff that #define's foo
or two packages try to define TOP differently.
*/

/* no path means that the path to the package must have been previously
defined or it is an error. */
#package X11

Oh, great! Now I not only have to merge the stuff from a bunch of
packages, but I *HAVE TO MERGE THEM IN THE RIGHT ORDER!* And there
might not even *BE* a right order, due to possible circular dependencies.
I see several advantages for this extension.

1. If the package doesn't exist the compiler would emit
an error message like:

C compiler error: compiler doesn't know #include package: foo

which tells you immediately that the system either doesn't have foo or
that the map file does not include it. Ideally as packages are

You could probably get almost as good results from:

#include <mysql.h> /* From the MySQL client library */
#include <libpng/png.h> /* From the PNG library, version 3 or better */

When you get an error, look at the source file and line indicated
in the error message (which most compilers get accurately, at least
for not-found include file errors).
added to a system the default map would be automatically updated, so

And how does the software that does that know where the default map *IS*?
(Remember that on some systems all paths start with C:\ and on others,
none of them do).

Remember, you're dealing with a compiler, so you have to deal with
non-admin programmers writing their own packages.
that the meaning of this message would be 99.9% of the time that
the package wasn't present on the system. As opposed to:

C compiler error: could not include bar.h

which generally means you have to start spelunking in the code and
looking carefully at the command line to see exactly what's wrong.
Note that if the package exists but the include doesn't, as might happen
for a botched install or a version mismatch, the error message might
be:

C compiler error: found package but not include file

2. A complex package would contain its own map file which
could look something like:

#package X11
#package GL
#define THISPACKAGE "."
#package foo <THISPACKAGE/foo>
#package woo <THISPACKAGE/woo>

Oh, yes, where exactly is "." when this set of specifications gets
merged into the default system map when the package is installed?
where the last three lines define paths to package locations within
the software distribution, and the first two define packages that
are required, but default to the default system map or alternatively
specified compile line map. This would be good because
at least ideally the very first compile could detect the absence of
required packages and stop at that point. Currently on a large
software project one might see hundreds of compiled modules and then
it stops when it can't find some include file. Moreover, one could
just look at this map file to determine if all required packages
are present on the system, and it would be trivial to write a tool
to carry out such a test automatically as part of a "make", for
instance.

Ok, how do I write a package specification that says I want
GD version 1 *OR* GD version 2? They have different package names
because they are so different, but my code knows how to handle
either.
3. Allows the unambiguous specification WITHIN THE CODE of the

It's NOT unambiguous without a global package registry, and even
then, it precludes the possibility of having multiple versions of
the same package available and being able to select one, as can
be done now with things like "-I/usr/local/mysql5.1.82.3.5.6.7.2z/include"
included in a Makefile.
intended file to be included, even if several packages have include
files with exactly the same name. This specification would survive
any reordering of the directory structure of the code modules.
(Obviously rearranging the include files in their directories would
break it!)

4. The extended syntax is backwards compatible with
existing #include. If the package part was omitted the compiler
would employ the existing #include methods.

Gordon L. Burditt
 
D

David Mathog

Gordon said:
This sounds like an absolute nightmare to maintain, for a lot
of reasons. It might work reasonably well if nobody using the
compiler is allowed to write their own code, and everyone used
package installers, but that sort of defeats the purpose of a
compiler, doesn't it?

1. You need a global package naming registry. Yes, I really mean
*GLOBAL*, if not intergalactic. And how many compilers support
use of Chinese character sets in package names?

This differs how from the current situation? Where you either
have GL,X11 or the like in the path for the #include or on the
command line.
2. Developers of code that is logically divided up need to get a
package name from the package naming registry BEFORE THEY CAN
CODE ANYTHING THAT USES IT.

No more so than now. If you write code that uses a GL path
to your includes, and it isn't OpenGL it will break with either model.
If avoiding conflicts was an issue one could always use a name
like: foo_xxxxxxxx, where xxxxxxxx is a bunch of random digits.
Then in the project map file use:

#define foo foo_xxxxxxxx

and you'll get the desired includes for the short
descriptive string "foo".

Honestly I don't see how #package leads to more conflicts than
does #include, basically they are handling the same data, just
in slightly different ways.
3. You have a file that NEEDS to be edited by all those developing
code, but one screwup can cause everyone's code to quit working,
which begs for it to be writable by the administrator only.
There's also the issue of how you keep it under source code control
when pieces of the file each belong to different packages under
development.

That's a good point. I suppose one could instead define a directory
where a bunch of these files live independent of each other and
read them all in as ordered by the #package statements.
(Ie, more or less like the existing include directories.) Just be
careful about overwriting existing files. In that
variant

#package X11

is pretty much equivalent to

#ifndef _loaded_X11
#include X11
#define _loaded_X11
#endif

except that the compiler handles the guard variables as part
of the language and the programmer doesn't need to do it explicitly
in the preprocessor as in the this example.
4. There is no obvious way to merge these files from various packages,
or to delete entries if you delete a package.

That's compiler and platform dependent, just like the existing
command line library declarations and search path declarations.

5. This setup if used for self-reference (e.g. the foo package references
its own include files with #include <foo> <bar.h>) makes absolutely
sure you can't have a production and a test version of the same package
that can co-exist.

That was the point of ...
Where, pray tell, is USEALT supposed to be defined or not
defined?

On the command line (or equivalent), so that the developer can
tell the compiler which set of includes to use.
Oh, great! Now I not only have to merge the stuff from a bunch of
packages, but I *HAVE TO MERGE THEM IN THE RIGHT ORDER!* And there
might not even *BE* a right order, due to possible circular dependencies.

How does this differ from the current #include method? Try rearranging
the #include statements in a lot of code and you'll run into exactly
the same sorts of order dependencies. Use the reinterpretation of
#package to be the "include and set up guard variables implicitly"
as in the example above. What's the problem with that?

Your point about circular dependencies is valid - as originally
proposed it would be possible to define maps that could not
be loaded. Ie, A requires B be loaded, B requires A be loaded,
so neither can be loaded. Worse, I have actually seen cases
where packages are mutually dependent and cross include files
from each other. So there would need to be a mechanism to
handle this. The simplest one would be for such poorly written
packages to not use a package requirement statement for the other
package, that would suspend the check.
Ideally people wouldn't write this sort of code though.

And how does the software that does that know where the default map *IS*?

That's up to the compiler. Just like the compiler has to know
where the default includes are now.

Oh, yes, where exactly is "." when this set of specifications gets
merged into the default system map when the package is installed?

The same problem currently obtains with include statements. Package
is no better and no worse in that regard.
Ok, how do I write a package specification that says I want
GD version 1 *OR* GD version 2? They have different package names
because they are so different, but my code knows how to handle
either.

That is also a good point. You've doubtless seen some
pretty hairy tests in configure files trying to handle this
exact problem. Moreover, I've seen plenty of those hairy tests
fail because they couldn't figure that 2.2b was bigger than 2.1.
There's more to it, if both are present you might
prefer the most recent one. However when multiple versions
are present they invariably live in different places, so

#package foo path

would allow you to select the desired one. Writing this package
line would likely have to be done by a "configure" equivalent.
It might be good though to have something like

cmp_version(foo,version) ->

-1 0 1, so that the compiler could determine
with preprocessor statements when an unexpected version has been
encountered. That suggests:

#package foo path version

to define an explicit version number to go with the path.
It's NOT unambiguous without a global package registry, and even
then, it precludes the possibility of having multiple versions of
the same package available and being able to select one, as can
be done now with things like "-I/usr/local/mysql5.1.82.3.5.6.7.2z/include"
included in a Makefile.

#package mysql "/usr/local/mysql5.1.82.3.5.6.7.2z/include"

or

/* from default package file */

#package mysql 5.1.82

/* the software under development does */

#package mysql /* compiler uses default version */
#if (mysql,5.1.82) != 0
whatever

Probably version would have to be restricted to X.Y.Z format, or
even a plain integer. In my experience too much flexibility with
version numbers causes a lot of grief.

I'm not stating that this particular syntax is the best, it's just
an idea for discussion.


Regards,

David Mathog
(e-mail address removed)
 
K

Kenny McCormack

1. You need a global package naming registry. Yes, I really mean
*GLOBAL*, if not intergalactic. And how many compilers support
use of Chinese character sets in package names?

This differs how from the current situation? Where you either
have GL,X11 or the like in the path for the #include or on the
command line.[/QUOTE]

The problem is that there is a distinguishable difference between decrying
the current situation and suggesting a way to fix it.

Nobody denies that the current situation is chaos. But nobody will thank
you for replacing the current chaos with a new chaos, even if the new chaos
is significantly less chaotic than the existing chaos. It has to be more
than significantly better - it has to be enormously better.

P.S. Actually, I'm surprised that no one has yet uttered the standard clc
answer to this sort of thing - which is, "Gee, its not in the standard so
we can't talk about it. Try over in comp.lang.c.std where they discuss the
standard; here we just slavishly obey it."
 
G

Gordon Burditt

#include said:
This differs how from the current situation? Where you either
have GL,X11 or the like in the path for the #include or on the
command line.

At least in the include directory setup, you can have private namespace
for a developer (e.g. under his home directory) or a package (under
its source directory tree, whereever it is installed this week).

No more so than now. If you write code that uses a GL path
to your includes, and it isn't OpenGL it will break with either model.

No, I'm talking about an author who writes a GL elephant trap extension,
and then writes another demo package that *USES* the GL elephant trap
extension. He has to get the package registered with the compiler
under your system, then put that name in almost every source file.
He can use -I/home/author/src/GL-elephanttrap/includes in a Makefile
the current way.
If avoiding conflicts was an issue one could always use a name
like: foo_xxxxxxxx, where xxxxxxxx is a bunch of random digits.

But then when he goes to release the code, he has to fix that in
almost EVERY source file if he didn't manage to get that particular
batch of random digits because someone else took it first.
Then in the project map file use:

#define foo foo_xxxxxxxx

and you'll get the desired includes for the short
descriptive string "foo".

And, as you describe later, it messes up any path names with "foo"
as one of the components in the path of completely unrelated packages.
It also messes up if there IS an official package "foo" that's used.

Are you also claiming that if I use
#include <package1> <X11.h>
I now *MUST* put in the project map file:
#package package1
?
Honestly I don't see how #package leads to more conflicts than
does #include, basically they are handling the same data, just
in slightly different ways.

Because it's in a SYSTEM-WIDE file, not individual Makefiles or
build scripts for individual packages.
That's a good point. I suppose one could instead define a directory
where a bunch of these files live independent of each other and
read them all in as ordered by the #package statements.

As ordered by the #package statements *WHERE*? In one big file?

Setting up a directory such that (almost) anyone can install stuff
there, but you can't clobber anyone else's stuff and they can't
clobber yours is difficult. Maintaining source code control over
that, kept with the individual packages being developed, is even
harder.

(Ie, more or less like the existing include directories.) Just be
careful about overwriting existing files. In that
variant

#package X11

is pretty much equivalent to

#ifndef _loaded_X11
#include X11
#define _loaded_X11
#endif

except that the compiler handles the guard variables as part
of the language and the programmer doesn't need to do it explicitly
in the preprocessor as in the this example.


That's compiler and platform dependent, just like the existing
command line library declarations and search path declarations.

It had better not be. If you need packages A, B, and C accessible
by default, you need to be able to combine the #package stuff from
each package into the compiler default map file, and when you decide
to remove package B, you need to take the stuff that B contributed
out of the compiler default map file. That means you need to be
able to figure out what lines package B contributed to the map file
in the first place. You also have to deal with upgrading package
B to a newer version, which doesn't use GL any more but uses something
else instead, so the reference for GL needs to go away.

The package installer/de-installer will have to know stuff like
where the compiler default map file is so it can modify it, but it
isn't psychic.
That was the point of ...


On the command line (or equivalent), so that the developer can
tell the compiler which set of includes to use.

But that tells the compiler to use the alternative include paths
for EVERY package that this package uses, not just MySQL, if I
happen to want to test out a newer version of MySQL, everything
else (like X11) staying the same. And it's quite possible that
most of the packages don't HAVE alternative include paths, because
there's no need. Now I have to create an include tree for every
possible *combination* of versions of different packages. And keep
it in sync if someone upgrades one of them, since this is a private
copy. Ick.
How does this differ from the current #include method? Try rearranging
the #include statements in a lot of code and you'll run into exactly
the same sorts of order dependencies.

No, often you don't, when sets of includes are from independent
packages. I may need MySQL includes and X11 includes, and it doesn't
matter which order I put the includes from one set relative to the
other set. If it *DOES* matter, chances are I've got a situation
where everything that needs one *MUST* need the other also, and in
that case the MySQL includes ought to include the required X11
includes. The ANSI C method of include guards will work here. Even
if it's a two-way dependency, it may not actually be circular. If
it's a two-way dependency, perhaps they belong in the same package.

Package requirements need not have a problem with being circular:
if A needs B and B needs A, then if you need A, include both A and B,
but neither one of them has to demand that the other come *FIRST*.
You simply demand that at the end, all the requirements are there.
Multi-pass linkers deal with this nicely. Using the C preprocessor,
which uses a single pass, won't.
Use the reinterpretation of
#package to be the "include and set up guard variables implicitly"
as in the example above. What's the problem with that?
Your point about circular dependencies is valid - as originally
proposed it would be possible to define maps that could not
be loaded. Ie, A requires B be loaded, B requires A be loaded,
so neither can be loaded. Worse, I have actually seen cases
where packages are mutually dependent and cross include files
from each other.

So there would need to be a mechanism to
handle this. The simplest one would be for such poorly written
packages to not use a package requirement statement for the other
package, that would suspend the check.
Ideally people wouldn't write this sort of code though.

The same problem currently obtains with include statements. Package
is no better and no worse in that regard.

When you merge all the map stuff into the default map, it changes
location, so "." is the default map file directory and no longer
bears any relationship to where the include files got installed.
The installer can't refer to "." as the location of something when
it registers an installation, it needs a full path name. This
is not an issue with current include files.
That is also a good point. You've doubtless seen some
pretty hairy tests in configure files trying to handle this
exact problem. Moreover, I've seen plenty of those hairy tests
fail because they couldn't figure that 2.2b was bigger than 2.1.

The issue isn't simply which version is later than another. Sometimes
there's a *LARGE* compatability break. MacOS 9 vs. MacOS X is one
example. GD changing the type of file it output (.gif vs. .png, I
think) due to patent issues is also a very large change.
There's more to it, if both are present you might
prefer the most recent one.

In the case of heavily incompatible versions, I prefer the one
the software is written to work with, especially where most of
the include files changed name and the interface is very different,
unless it can actually work with both.

Or sometimes the issue is more fundamental: the *LICENSES* are
different and the newer version can't be legally linked.
However when multiple versions
are present they invariably live in different places, so

#package foo path

would allow you to select the desired one.

Yes, but now we're back to custom hand editing of files in
the distribution I'm trying to compile. I suppose that's one
more thing a Configure script could do.
#package mysql "/usr/local/mysql5.1.82.3.5.6.7.2z/include"

If I put that in the default map file, EVERYTHING newly-compiled
gets linked with the version I'm testing, before I've finished
testing it, much less decided to put it in production.
or

/* from default package file */

#package mysql 5.1.82

/* the software under development does */

#package mysql /* compiler uses default version */
#if (mysql,5.1.82) != 0
whatever

Probably version would have to be restricted to X.Y.Z format, or
even a plain integer. In my experience too much flexibility with
version numbers causes a lot of grief.

I'm not sure what you are getting at here, but it seems you would
have conflicts with multiple use of the same package name.
I'm not stating that this particular syntax is the best, it's just
an idea for discussion.

You have a *LOT* of potential here for naming conflicts. Perhaps
the easiest to solve is the package naming (and that's far from
trivial). If you are allowing preprocessor substitution into path
names, then you've got a situation where a define of a common
pathname component in one package like:

#define usr user

can seriously mess up entirely unrelated packages after it.
And there really isn't a safe set of variables to use.
Take an idea from the shell here: specifically mark substitution.

#package mysql /usr/local/include/mysql
is not substituted, but
#package mysql /${usr}/local/include/mysql
is. Now you have to deal with multiple people #define'ing TOP.

After you deal with the issue of INCLUDE paths, you have the issue
of LIBRARY paths. You might as well try to tackle that at the same time.


Gordon L. Burditt
 
T

tedu

David said:
For instance, consider replacing this:

#include "foo/bar.h"

and this

#include <bar.h> /* same file, but path from compiler */

with this:

#include <bar.h> <foo>

Where the code now says explicitly that the include bar.h is from
a package known as foo. Note, no paths in the include - the
include file is specified by WHAT it is and not by WHERE it is.
To support this the language needs only a minor extension - a
way to map a text file containing the list of packages to
their locations so that the compiler can pull in the right file.

you could probably work this out with the preprocessor as is:

/* file.c */
#include "package.map"

#define CONCAT(a,b) a##b
#define _STRING(a) #a
#define STRING(a) _STRING(a)
#define MAKE_INCLUDE(a,b) STRING(CONCAT(a,b))

#define stdio MAKE_INCLUDE(stdpkg,stdio.h)

#include stdio

int main() {
printf("hello\n");
}

/* package.map */
/* change this to point to your C compiler's include directory */
#define stdpkg /usr/include/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,828
Latest member
LauraCastr

Latest Threads

Top