It is certainly not something I would ever recommend writing (and
indeed, I would and do recommend against it). But:
"extern" is a reference to a variable with static storage class
and external linkage.
Unfortunately, this is not the case (though it would be nice if it
were).
The "extern" keyword has two effects, one on "linkage" and one on
"definition-ness".
In order to explain this, we have to define the terms "scope" and
"linkage", and describe what "definition-ness" is. We might as
well also mention "duration".
There are three durations for "objects" (objects being regions of
memory that hold bit patterns): static, automatic, and allocated.
A static-duration object -- a variable, or the contents of a string,
or in C99, the contents of some (but not all) compound literals --
is created at program startup (or possibly even before then, but
the C standards are little concerned with anything before or after
a program runs). It persists in memory until the program exits
(and, again, possibly even afterward).
An automatic-duration object is created by entering the block that
defines it, and destroyed (at least potentially) upon exiting that
block.
An allocated-duration object is created by calling malloc(), and
destroyed by calling free() with the value malloc() returned. (I
prefer to gloss over realloc() at this point.
) Since pointers
are required in order to make any sensible use of allocated-duration
objects, we can ignore them here.
The "scope" of an "identifier" (a variable or function name, etc.;
in this case we are only really interested in variables) describes
the locations in which the identifier is visible, within a single
translation unit (roughly, "C source file after expanding #includes").
Given something like:
void f(void) {
int i;
... code section 1 ...
{ double d; ... code section 2 ... }
... code section 3 ...
}
the scope of "i" is most[%] of the body of "f", including all three
code sections. We can refer to "i" in any of those areas and get
the variable "i" that is local to f(). (This assumes there are no
new definitions of an inner-scope "i"; in C89 those would require
braces, while in C99 we could define additional "i"s in "for" loops.)
The scope of "d", however, is restricted to code section 2.
[% I say "most of" rather than "all of" because i does not come
into scope until partway through the declaration. If we were to
declare another variable before "i", we could not refer to "i"
yet:
void f(void) { int *ip = &i; int i = 42; ... } /* ERROR */
so i is not visible in *all* of the body of f().]
Scopes are, in effect, numbered: open braces (and in C99, "for"
loops with declarations) increment a number, while close braces
(and the ends of those C99 for loops) decrement it, and a variable
"goes out of scope" when its number is used up:
int a0; /* scope number = 0 */
void f(void)
{ /* now scope # = 1 */
int a1; /* a1 at scope 1 */
{
int a2; /* a2 at scope 2 */
for (int a3 = 0; a3 < N; a3++) /* a3 at scope 3 */
...;
/* scope 3 terminated, a3 vanishes */
} /* scope 2 terminated, a2 vanishes */
...
} /* scope 1 terminated, a1 vanishes */
Variables named as part of a function prototype are "smuggled down"
into scope 1. (Some compilers actually reserve scope level 1 for
"goto" labels, making function-level variables, including formal
parameters, occupy scope 2. In effect, the counter goes 0, 2, 3,
4, ..., 3, 2, 0. But this is just an implementation gimmick that
achieves what the standard refers to as "function scope" for goto
labels.)
The "linkage" of an identifier has to do with whether the identifier
is visible *outside* a given translation unit. File-scope variables
always have some sort of linkage. There are only two real linkages,
"internal" and "external". Block-scope variables *can* have linkage,
but usually do not (or as the standard says, the linkage they have
is "no linkage", which is kind of Zen-ish I guess
-- the text
of the standard says three linkages, internal, external, and none).
We can get block-scope variables that have linkage when we use
"extern":
void f(void) {
extern int a;
...
}
though I generally recommend that any identifiers with linkage be
named at file scope. (I find this less confusing, in general,
though there are a reasonable number of toss-up cases as exceptions,
i.e., the code is equally unclear whether the declaration is block
or file. External-linkage variables tend to make code harder to
read, so are best avoided unless doing so also makes the code harder
to read. "... All courses may run ill.")
At file scope, the "static" keyword *always* means "internal linkage"
(because file scope variables invariably have static duration
already, so there is no need for "static" to mean "static duration").
At block scope, the "static" keyword means "static duration" instead,
and has no effect on linkage (which remains "no linkage"). Hence,
in:
static int a0;
void f(void) {
static int a1;
...
}
a0 has file scope, static duration, and internal linkage; a1 has
block scope (numerically "level 1" or "level 2" inside a typical
compiler), static duration, and no linkage.
The "extern" keyword is more problematic. It would make sense if
it always meant "external linkage" -- but file scope variables
already get external linkage unless we suppress it with "static".
So from a linkage standpoint, writing the "extern" on b1 here is
redundant:
int b0; /* file scope, static duration, external linkage */
extern int b1; /* file scope, static duration, external linkage */
In this case, the real use for extern is to suppress "definition-ness".
In Standard C, a variable that is initialized where it is declared
is always a definition of that variable:
int c0 = 3;
Here c0 has the same scope, duration, and linkage as usual (file,
static, and external), but we have definitely defined c0 and given
it an initial value (3). No other translation unit should attempt
to define c0. As Joe Wright notes, we can use "extern" -- preferably
in a header file -- thus:
extern int c0;
to tell the compiler "there is a c0 out there, defined in one
translation unit; use that one."
Now, also in Standard C, if we declare a variable at file scope,
but without the "extern" keyword *and* without an initializer, the
Standard refers to this as a "tentative definition". That is, this
declaration is a definition if and only if the same translation
unit does not define the same variable later, using an actual
initializer:
int c1; /* tentative definition of c1 */
int c2; /* tentative definition of c2 */
...
int c2 = 42; /* actual definition of c2 */
At the end of any given translation unit, the compiler is required
to "go back" to all the tentative definitions that have not been
supplanted by actual definitions, and initialize those variable to
zero (0, '\0', 0.0, NULL, whatever is appropriate given the type
of the variable). (Many compilers for Unix-like systems achieve
this without actually "going back", by using a linker trick in
which an uninitialized variable is marked as being in "BSS" space.
BSS stands for Block Started by Symbol, and dates back to ancient
IBM assembler. The linker combines bss symbols with data symbols
as needed, making things particularly easy for the compiler. But
again, this is just an implementation trick -- the C standard says
that tentative definitions become actual definitions, initialized
to zero.)
Tentative definitions work with internal-linkage identifiers too:
static int c3;
static int c3 = 6 * 9;
The first declaration is a tentative definition, which is replaced
by the second declaration, which is an explicit definition.
[Begin sidebar]
One might wonder why C should have tentative definitions at all.
The answer is, they allow you to create circular data structures
at compile time. Consider the following:
struct queue {
struct queue *forw, *back; /* doubly linked queue */
... additional data ...
};
static struct queue qelem1, qelem2; /* tentative definitions */
/* distinguished dummy queue head (queue element #0) */
static struct queue qhead = { &qelem1, &qelem2 };
static struct queue qelem1 = { &qelem2, &qhead, ... };
static struct queue qelem2 = { &qhead, &qelem1, ... };
Here, queue element 1 points forward to element 2 and backward to
the dummy queue head; element 2 points forward to the head and
backward to element 1; and the head points forward to element 1
and backward to element 2. We have a classic doubly-linked queue
with two elements, all created at compile time, rather than by
some runtime initialization.
Try to write the above without using tentative definitions. Note
that if we were to drop the "static"s and use external linkage
instead, we *could* do it, using the "extern" keyword!
[End sidebar]
Thus, at file scope, C generally uses the "extern" keyword to mean
"suppress even tentative-definition-ness of this external-linkage
identifier". C does not need "extern" to give the identifier
external linkage, because that is the default: we have to use
"static" to give it internal linkage.
But now take a look at the following wording in the C standard:
[#4] For an identifier declared with the storage-class
specifier extern in a scope in which a prior declaration of
that identifier is visible,22 if the prior declaration
specifies internal or external linkage, the linkage of the
identifier at the later declaration becomes the linkage
specified at the prior declaration. If no prior declaration
is visible, or if the prior declaration specifies no
linkage, then the identifier has external linkage.
Although this is written in Standard-ese, it really says: "If the
programmer uses the extern keyword, the compiler has to check to
see if the variable is already declared using the static keyword
so as to give it internal linkage. If this is the case, the compiler
should pretend that the extern keyword is not there, and the static
keyword is there instead. Otherwise -- when no prior declaration
is visible or the prior declaration specifies no linkage -- the
compiler should go ahead and do external linkage."
Hence:
static int d0;
extern int d0;
gives d0 internal linkage.
However, consider the following translation unit:
static int d1;
void f(void)
{
int d1;
{
extern int d1; /* ERROR -- DO NOT DO THIS! */
}
}
Here, we use a block-scope d1 to hide the internal-linkage d1. This
block-scope d1 has no linkage (and automatic duration, and block
scope of course). So the "visible" "prior declaration" that the
Standard-ese above talks about has "no linkage", and the third "d1"
above has external linkage. This, according to the same text in
the standard (three paragraphs on), has undefined behavior:
[#7] If, within a translation unit, the same identifier
appears with both internal and external linkage, the
behavior is undefined.
To avoid the problem, I strongly recommend against ever allowing
any identifier to be defined once with "static" and then again
later with "extern", in the same translation unit. The well-defined
case, illustrated above with d0, can be handled by using the "static"
keyword (at file scope). The undefined case, illustrated above
with d1, occurs when "extern" is used in block scope. Avoiding
using "extern" in block scope is also possible, and often sensible;
if you do both, you will certainly be safe from this particular
pitfall ("belt and suspenders", or -- for the British -- "belt and
braces", as it were).