in <
[email protected]>:
#
#> consider a small project of 100 C source and 100 header files.
#> The coding rules require that only C files include headers,
#> headers are not allowed to include other headers (not sure if
#> this is 100% the "IWYU - Include What You Use" paradigma).
#>
#> The headers contain only what headers should contain:
#> prototypes, typedefs, declarations, macro definitions.
#>
#> The problem: given a set of headers, determine a sequence of
#> #include directives that avoids syntax errors due to undeclared
#> identifiers. I.e. if "foo.h" declares type foo_t and "bar.h" uses
#> foo_t in a prototype, "bar.h" must be included before "foo.h".
#
# I have been reading the many comments in this thread with some
# interest. Reading through your responses, I have come up with
# this summary of motivations for using this approach (these are
# my paraphrasings, often not quotes of the originals):
#
# + no lint warnings about repeated headers
# + no need for include guards
# + doxygen dependency graph much simpler
# + no cycles in include graph
# + removing unneeded includes is easier
# + simpler compiler diagnostics
# + easier to generate dependency makefile
# + improved identifiability of refactoring opportunities
# + ... and of interface accumulation [not sure what this means]
# + ... and of code collecting fat
# + constant reminders of all dependencies of each .c file
Thanks Tim, for taking the time. To expand on interface accumulation:
the process where interface A needs another, which then grows the need
for yet another and eventually includes half the total number of
interfaces of the project.
Lets face it: programmers are lazy, and its too easy in C to blow up an
initially small interface design by writing another #include in the
first header that looks like it's included by "most" of the files where
it is needed and include it directly where not. How many projects have
you seen with project_types.h, misc.h, macros.h, and such headers
invented on the spot.
# Some questions:
#
# 1. Is this an accurate summary?
#
# 2. Has anything been left out (ie, is there any other
# positive you would add to the list)?
+ Reduced processing time by all the tools that operate on
C source. That's the compiler of course, but also lint,
auto dependency generators, static checkers, doxygen, ...
For each translation unit, the headers are tokenized and
parsed at most once (not at all when in an disabled #ifdef).
I observe 20% in our project.
+ Giving developers a hard and fast unambiguous rule in which file the
include directives go. There is only one choice. If foo.c needs the
bar_t declaration from bar.h, it gets included. Contrast this with
"traditional wisdom", where possibly a large number of headers would be
candidates for the new #include statement. A good design would make this
choice obvious, a bright developer would know the state of the art, but
it's a rare trait. "Indented six feet down and covered with dirt" is the
reality out there. Yes, this requires selection of the proper *line*
among the includes. But *any* compiler will tell in no uncertain words
if that was the wrong line or you're missing another header. It's fool
proof.
# 3. Would you mind listing these from most important
# to least important, and giving some indication of
# relative weight for each item?
+ improved identifiability of refactoring opportunities
$ grep -c '#include "foo.h"' */*.c
Whoa! foo.h is included by 95% of files, why?
Whoa! foo.h is included by one file only. Maybe incorporate it.
Hmm. All foo.h require bar.h, baz.h and blurb.h. Could I
encapsulate this better? Maybe merge some headers?
(50 points)
+ ... and of interface accumulation
$ grep -c '#include' */*.c
Whoa! big.c includes everything and the kitchen sink. What's up?
(30 points)
+ ... and of code collecting fat - optional debug code in #ifdef maze.
Should be moved out to separate object files, linked in when needed.
(20 points)
+ doxygen dependency graph much simpler. It's a document for
the customer.
(20 points)
+ removing unneeded includes is easier
(20 points)
+ constant reminders of all dependencies of each .c file
(10 points)
+ no cycles in include graph
(10 points)
+ giving developers a fast rule in what file the include goes.
(10 points)
+ reduced processing time by all the tools that operate on source
(10 points)
+ no lint warnings about repeated headers
(10 points)
+ easier to generate dependency makefile
(7 points)
+ no need for include guards
(5 points)
+ simpler compiler diagnostics
(5 points)
The overall goal is to make emerging complexity stand out the moment it
emerges, opening developers eyes. The reality in any random project is:
not all developers are stellar C programmers (the set of participants in
this newsgroup then and now looks like an accurate statistical sample.
From Tanmoy to Bill...)
Unfortunately, the C preprocessor is a deceptive tool (apologies to dmr,
may his soul rest in peace, I know why it was needed in the time back
then) and gets frequently abused. Taming it is probably what I'm after.
The only reason cpp has survived is because of the include guard kluge.
Making interfaces stand out, both in number and circumference, should
help, I hope.
[...]
#> Can you think of a lightweight way to solve this? Maybe using
#> perl, the unix tool box, make, gmake, gcc, a C lexer?
#
# I may have some suggestions here, but first I would like to read
# through responses to the questions asked above, to make sure I'm
# going in a good direction.
This is certainly incomplete:
One would need to find the identifiers of macro definitions (easy)
and typedefs (harder). In prototypes one must distinguish between
types and optional parameter names.
In other declarations one needs to determine the declared identifier.
This is a little more involved for enums and aggregates.
Build the "needed by" pairs and pipe to tsort(1). Voilà !
Version 7 came with all the goodies built in, didn't it?
Regards,
Jens