T
Tim Rentsch
The Semantics of 'volatile'
===========================
I've been meaning to get to this for a while, finally there's a
suitable chunk of free time available to do so.
To explain the semantics of 'volatile', we consider several
questions about the concept and how volatile variables behave,
etc. The questions are:
1. What does volatile do?
2. What guarantees does using volatile provide? (What memory
regimes must be affected by using volatile?)
3. What limits does the Standard set on how using volatile
can affect program behavior?
4. When is it necessary to use volatile?
We will take up each question in the order above. The comments
are intended to address both developers (those who write C code)
and implementors (those who write C compilers and libraries).
What does volatile do?
----------------------
This question is easy to answer if we're willing to accept an
answer that may seem somewhat nebulous. Volatile allows contact
between execution internals, which are completely under control
of the implementation, and external regimes (processes or other
agents) not under control of the implementation. To provide such
contact, and provide it in a well-defined way, using volatile
must ensure a common model for how memory is accessed by the
implementation and by the external regime(s) in question.
Subsequent answers will fill in the details around this more
high level one.
What guarantees does using volatile provide?
--------------------------------------------
The short answer is "None." That deserves some elaboration.
Another way of asking this question is, "What memory regimes must
be affected by using volatile?" Let's consider some possibilities.
One: accesses occur not just to registers but to process virtual
memory (which might be just cache); threads running in the same
process affect and are affected by these accesses. Two: accesses
occur not just to cache but are forced out into the inter-process
memory (or "RAM"); other processes running on the same CPU core
affect and are affected by these accesses. Three: accesses occur
not just to memory belonging to the one core but to memory shared
by all the cores on a die; other processes running on the same CPU
(but not necessarily the same core) affect and are affected by
these accesses. Four: accesses occur not just to memory belonging
to one CPU but to memory shared by all the CPUs on the motherboard;
processes running on the same motherboard (even if on another CPU
on that motherboard) affect and are affected by these accesses.
Five: accesses occur not just to fast memory but also to some slow
more permanent memory (such as a "swap file"); other agents that
access the "swap file" affect and are affected by these accesses.
The different examples are intended informally, and in many cases
there is no distinction between several of the different layers.
The point is that different choices of regime are possible (and
I'm sure many readers can provide others, such as not only which
memory is affected but what ordering guarantees are provided).
Now the question again: which (if any) of these different
regimes are /guaranteed/ to be included by a 'volatile' access?
The answer is none of the above. More specifically, the Standard
leaves the choice completely up to the implementation. This
specification is given in one sentence in 6.7.3 p 6, namely:
What constitutes an access to an object that has
volatile-qualified type is implementation-defined.
So a volatile access could be defined as coordinating with any of
the different memory regime alternatives listed above, or other,
more exotic, memory regimes, or even (in the claims of some ISO
committee participants) no particular other memory regimes at all
(so a compiler would be free to ignore volatile completely)[*].
How extreme this range is may be open to debate, but I note that
Larry Jones, for one, has stated unequivocally that the possibility
of ignoring volatile completely is allowed under the proviso given
above. The key point is that the Standard does not identify which
memory regimes must be affected by using volatile, but leaves that
decision to the implementation.
A corollary to the above that any volatile-qualified access
automatically introduces an implementation-defined aspect to a
program.
[*] Possibly not counting the specific uses of 'volatile' as it
pertains to setjmp/longjmp and signals that the Standard
identifies, but these are side issues.
What limits are there on how volatile access can affect program behavior?
-------------------------------------------------------------------------
More properly this question is "What limits does the Standard
impose on how volatile access can affect program behavior?".
Again the short answer is None. The first sentence in 6.7.3 p 6
says:
An object that has volatile-qualified type may be modified
in ways unknown to the implementation or have other unknown
side effects.
Nowhere in the Standard are any limitations stated as to what
such side effects might be. Since they aren't defined, the
rules of the Standard identify the consequences as "undefined
behavior". Any volatile-qualified access results in undefined
behavior (in the sense that the Standard uses the term).
Some people are bothered by the idea that using volatile produces
undefined behavior, but there really isn't any reason to be. At
some level any C statement (or variable access) might behave in
ways we don't expect or want. Program execution can always be
affected by peculiar hardware, or a buggy OS, or cosmic rays, or
anything else outside the realm of what the implementation knows
about. It's always possible that there will be unexpected
changes or side effects, in the sense that they are unexpected by
the implementation, whether volatile is used or not. The
difference is, using volatile interacts with these external
forces in a more well-defined way; if volatile is omitted, there
is no guarantee as to how external forces on particular parts
of the physical machine might affect (or be affected by) changes
in the abstract machine.
Somewhat more succinctly: using volatile doesn't affect the
semantics of the abtract machine; it admits undefined behavior
by unknown external forces, which isn't any different from the
non-volatile case, except that using volatile adds some
(implementation-defined) requirements about how the abstract
machine maps onto the physical machine in the external forces'
universe. However, since the Standard mentions unknown side
effects explicitly, such things seem more "expectable" when
volatile is used. (volatile == Expect the unexected?)
When is it necessary to use volatile?
-------------------------------------
In terms of pragmatics this question is the most interesting of
the four. Of course, as phrased the question asked is more of a
developer question; for implementors, the phrasing would be
something more like "What requirements must my implementation
meet to satisfy developers who are using 'volatile' as the
Standard expects?"
To get some details out of the way, there are two specific cases
where it's necessary to use volatile, called out explicitly in
the Standard, namely setjmp/longjmp (in 7.13.2.1 p 3) and
accessing static objects in a signal handler (in 7.14.1.1 p 5).
If you're a developer writing code for one of these situations,
either use volatile, code around it so volatile isn't needed
(this can be done for setjmp), or be sure that the particular
code you're writing is covered by some implementation-defined
guarantees (extensions or whatever). Similarly, if you're an
implementor, be sure that using volatile in the specific cases
mentioned produces code that works; what this means is that the
volatile-using code should behave just like it would under
regular, non-exotic control structures. Of course, it's even
better if the implementation can do more than the minimum, such
as: define and document some additional cases for signal
handling code; make variable access in setjmp functions work
without having to use volatile, or give warnings for potential
transgressions (or both).
The two specific cases are easy to identify, but of course the
interesting cases are everything else! This area is one of the
murkiest in C programming, and it's useful to take a moment to
understand why. For implementors, there is a tension between
code generation and what semantic interpretation the Standard
requires, mostly because of optimization concerns. Nowhere is
this tension felt more keenly than in translating 'volatile'
references faithfully, because volatile exists to make actions in
the abstract machine align with those occurring in the physical
machine, and such alignment prevents many kinds of optimization.
To appreciate the delicacy of the question, let's look at some
different models for how implementations might behave.
The first model is given as an Example in 5.1.2.3 p 8:
EXAMPLE 1 An implementation might define a one-to-one
correspondence between abstract and actual semantics: at
every sequence point, the values of the actual objects would
agree with those specified by the abstract semantics.
We call this the "White Box model". When using implementations
that follow the White Box model, it's never necessary to use
volatile (as the Standard itself points out: "The keyword
volatile would then be redundant.").
At the other end of the spectrum, a "Black Box model" can be
inferred based on the statements in 5.1.2.3 p 5. Consider an
implementation that secretly maintains "shadow memory" for all
objects in a program execution. Regular memory addresses are
used for address-taking or index calculation, but any actual
memory accesses would access only the shadow memory (which is at
a different location), except for volatile-qualified accesses
which would load or store objects in the regular object memory
(ie, at the machine addresses produced by pointer arithmetic or
the & operator, etc). Only the implementation would know how to
turn a regular address into a "shadow" object access. Under the
Black Box model, volatile objects, and only volatile objects, are
usable in any useful way by any activity outside of or not under
control of the implementation.
At this point we might stop and say, well, let's just make a
conservative assumption that the implementation is following the
Black Box model, and that way we'll always be safe. The problem
with this assumption is that it's too conservative; no sensible
implementation would behave this way. Consider some of the
ramifications:
1. Couldn't use a debugger to examine variables (except
volatile variables);
2. Couldn't call an externally defined function written
in assembly or another language, unless the function
is declared with a prototype having volatile-qualified
parameters (and even that case isn't completely clear,
because of the rule at the end of 6.7.5.3 p 15 about
how functions types are compared and composited);
3. Couldn't call ordinary OS functions like read() and
write() unless the memory buffers were accessed
using volatile-qualified expressions.
These "impossible" conditions never happen because no
implementation is silly enough to take the Black Box model
literally. Technically, it would be allowed, but no one would
use it because it breaks too many deep assumptions about how a C
runtime interacts with its environment.
A more realistic model is one of many "Gray Box models" such
as the example implementation mentioned in 5.1.2.3 p 9:
Alternatively, an implementation might perform various
optimizations within each translation unit, such that the
actual semantics would agree with the abstract semantics
only when making function calls across translation unit
boundaries. In such an implementation, at the time of each
function entry and function return where the calling
function and the called function are in different
translation units, the values of all externally linked
objects and of all objects accessible via pointers therein
would agree with the abstract semantics. Furthermore, at
the time of each such function entry the values of the
parameters of the called function and of all objects
accessible via pointers therein would agree with the
abstract semantics. In this type of implementation, objects
referred to by interrupt service routines activated by the
signal function would require explicit specification of
volatile storage, as well as other implementation-defined
restrictions.
Here the implementation has made a design choice that makes
volatile superfluous in many cases. To get variable values to
store-synchronize, we need only call an appropriate function:
extern void okey_dokey( void );
extern int v;
...
v = 49; // storing into v is a "volatile" access
okey_dokey();
foo( v ); // this access is also "volatile"
Note that these "volatile" accesses work the way an actual
volatile access does because of an implementation choice about
calling functions defined in other translation units; obviously
that's implementation dependant.
Let's look at one more model, of interest because it comes up in
operating systems, which are especially prone to want to do
things that won't work without 'volatile'. In our hypothetical
kernel code, we access common blocks by surrounding the access
code with mutexes, which for simplicity are granted with spin
locks. Access code might look like this:
while( block_was_locked() ) { /*spin*/ }
// getting here means we have the lock
// access common block elements here
// ... and access some more
// ... and access some more
// ... and access some more
unlock_block();
Here it's understood that locking ('block_was_locked()') and
unlocking ('unlock_block()') will be done using volatile, but the
accesses inside the critical region of the mutex just use regular
variable access, since the block access code is protected by the
mutex.
If one is implementing a compiler to be used on operating system
kernels, this model (only partially described, but I think the
salient aspects are clear enough) is one worth considering. Of
course, the discussion here is very much simplified, there are
lots more considerations when designing actual operating system
locking mechanisms, but the basic scheme should be evident.
Looking at a broader perspective, is it safe to assume this model
holds in some unknown implementation(s) on our platforms of
choice? No, of course it isn't. The behavior of volatile is
implementation dependant. The model here is relevant because
many kernel developers unconsciously expect their assumptions
about locks and critical regions, etc., to be satisfied by using
volatile in this way. Any sensible implementation would be
foolish to ignore such assumptions, especially if kernel
developers were known to be in the target audience.
Returning to the original question, what answers can we give?
If you're an implementor, know that the Standard offers great
latitude in what volatile is required to do, but choosing any of
the extreme points is likely to be a losing strategy no matter what
your target audience is. Think about what other execution
regime(s) your target audience wants/needs to interact with;
choose an appropriate model that allows volatile to interact with
those regimes in a convenient way; document that model (as 6.7.3p6
requires for this implementation-defined aspect) and follow it
faithfully in producing code for volatile access. Remember that
you're implementing volatile to provide access to alternative
execution regimes, not just because the Standard requires it, and
it should work to provide that access, conveniently and without
undue mental contortions. Depending on the extent of the regimes
or the size of the target audience, several different models might
be given under different compiler options (if so it would help to
record which model is being followed in each object file, since the
different models are likely not to intermix in a constructive way).
If you're a developer, and are intent on being absolutely
portable across all implmentations, the only safe assumption is
the Black Box model, so just make every single variable and
object access be volatile-qualified, and you'll be safe. More
practically, however, a Gray Box model like one of the two
described above probably holds for the implementation(s) you're
using. Look for a description of what the safe assumptions are
in the implementations' documentation, and follow that; and, it
would be good to let the implementors know if a suitable
description isn't there or doesn't describe the requirements
adequately.
===========================
I've been meaning to get to this for a while, finally there's a
suitable chunk of free time available to do so.
To explain the semantics of 'volatile', we consider several
questions about the concept and how volatile variables behave,
etc. The questions are:
1. What does volatile do?
2. What guarantees does using volatile provide? (What memory
regimes must be affected by using volatile?)
3. What limits does the Standard set on how using volatile
can affect program behavior?
4. When is it necessary to use volatile?
We will take up each question in the order above. The comments
are intended to address both developers (those who write C code)
and implementors (those who write C compilers and libraries).
What does volatile do?
----------------------
This question is easy to answer if we're willing to accept an
answer that may seem somewhat nebulous. Volatile allows contact
between execution internals, which are completely under control
of the implementation, and external regimes (processes or other
agents) not under control of the implementation. To provide such
contact, and provide it in a well-defined way, using volatile
must ensure a common model for how memory is accessed by the
implementation and by the external regime(s) in question.
Subsequent answers will fill in the details around this more
high level one.
What guarantees does using volatile provide?
--------------------------------------------
The short answer is "None." That deserves some elaboration.
Another way of asking this question is, "What memory regimes must
be affected by using volatile?" Let's consider some possibilities.
One: accesses occur not just to registers but to process virtual
memory (which might be just cache); threads running in the same
process affect and are affected by these accesses. Two: accesses
occur not just to cache but are forced out into the inter-process
memory (or "RAM"); other processes running on the same CPU core
affect and are affected by these accesses. Three: accesses occur
not just to memory belonging to the one core but to memory shared
by all the cores on a die; other processes running on the same CPU
(but not necessarily the same core) affect and are affected by
these accesses. Four: accesses occur not just to memory belonging
to one CPU but to memory shared by all the CPUs on the motherboard;
processes running on the same motherboard (even if on another CPU
on that motherboard) affect and are affected by these accesses.
Five: accesses occur not just to fast memory but also to some slow
more permanent memory (such as a "swap file"); other agents that
access the "swap file" affect and are affected by these accesses.
The different examples are intended informally, and in many cases
there is no distinction between several of the different layers.
The point is that different choices of regime are possible (and
I'm sure many readers can provide others, such as not only which
memory is affected but what ordering guarantees are provided).
Now the question again: which (if any) of these different
regimes are /guaranteed/ to be included by a 'volatile' access?
The answer is none of the above. More specifically, the Standard
leaves the choice completely up to the implementation. This
specification is given in one sentence in 6.7.3 p 6, namely:
What constitutes an access to an object that has
volatile-qualified type is implementation-defined.
So a volatile access could be defined as coordinating with any of
the different memory regime alternatives listed above, or other,
more exotic, memory regimes, or even (in the claims of some ISO
committee participants) no particular other memory regimes at all
(so a compiler would be free to ignore volatile completely)[*].
How extreme this range is may be open to debate, but I note that
Larry Jones, for one, has stated unequivocally that the possibility
of ignoring volatile completely is allowed under the proviso given
above. The key point is that the Standard does not identify which
memory regimes must be affected by using volatile, but leaves that
decision to the implementation.
A corollary to the above that any volatile-qualified access
automatically introduces an implementation-defined aspect to a
program.
[*] Possibly not counting the specific uses of 'volatile' as it
pertains to setjmp/longjmp and signals that the Standard
identifies, but these are side issues.
What limits are there on how volatile access can affect program behavior?
-------------------------------------------------------------------------
More properly this question is "What limits does the Standard
impose on how volatile access can affect program behavior?".
Again the short answer is None. The first sentence in 6.7.3 p 6
says:
An object that has volatile-qualified type may be modified
in ways unknown to the implementation or have other unknown
side effects.
Nowhere in the Standard are any limitations stated as to what
such side effects might be. Since they aren't defined, the
rules of the Standard identify the consequences as "undefined
behavior". Any volatile-qualified access results in undefined
behavior (in the sense that the Standard uses the term).
Some people are bothered by the idea that using volatile produces
undefined behavior, but there really isn't any reason to be. At
some level any C statement (or variable access) might behave in
ways we don't expect or want. Program execution can always be
affected by peculiar hardware, or a buggy OS, or cosmic rays, or
anything else outside the realm of what the implementation knows
about. It's always possible that there will be unexpected
changes or side effects, in the sense that they are unexpected by
the implementation, whether volatile is used or not. The
difference is, using volatile interacts with these external
forces in a more well-defined way; if volatile is omitted, there
is no guarantee as to how external forces on particular parts
of the physical machine might affect (or be affected by) changes
in the abstract machine.
Somewhat more succinctly: using volatile doesn't affect the
semantics of the abtract machine; it admits undefined behavior
by unknown external forces, which isn't any different from the
non-volatile case, except that using volatile adds some
(implementation-defined) requirements about how the abstract
machine maps onto the physical machine in the external forces'
universe. However, since the Standard mentions unknown side
effects explicitly, such things seem more "expectable" when
volatile is used. (volatile == Expect the unexected?)
When is it necessary to use volatile?
-------------------------------------
In terms of pragmatics this question is the most interesting of
the four. Of course, as phrased the question asked is more of a
developer question; for implementors, the phrasing would be
something more like "What requirements must my implementation
meet to satisfy developers who are using 'volatile' as the
Standard expects?"
To get some details out of the way, there are two specific cases
where it's necessary to use volatile, called out explicitly in
the Standard, namely setjmp/longjmp (in 7.13.2.1 p 3) and
accessing static objects in a signal handler (in 7.14.1.1 p 5).
If you're a developer writing code for one of these situations,
either use volatile, code around it so volatile isn't needed
(this can be done for setjmp), or be sure that the particular
code you're writing is covered by some implementation-defined
guarantees (extensions or whatever). Similarly, if you're an
implementor, be sure that using volatile in the specific cases
mentioned produces code that works; what this means is that the
volatile-using code should behave just like it would under
regular, non-exotic control structures. Of course, it's even
better if the implementation can do more than the minimum, such
as: define and document some additional cases for signal
handling code; make variable access in setjmp functions work
without having to use volatile, or give warnings for potential
transgressions (or both).
The two specific cases are easy to identify, but of course the
interesting cases are everything else! This area is one of the
murkiest in C programming, and it's useful to take a moment to
understand why. For implementors, there is a tension between
code generation and what semantic interpretation the Standard
requires, mostly because of optimization concerns. Nowhere is
this tension felt more keenly than in translating 'volatile'
references faithfully, because volatile exists to make actions in
the abstract machine align with those occurring in the physical
machine, and such alignment prevents many kinds of optimization.
To appreciate the delicacy of the question, let's look at some
different models for how implementations might behave.
The first model is given as an Example in 5.1.2.3 p 8:
EXAMPLE 1 An implementation might define a one-to-one
correspondence between abstract and actual semantics: at
every sequence point, the values of the actual objects would
agree with those specified by the abstract semantics.
We call this the "White Box model". When using implementations
that follow the White Box model, it's never necessary to use
volatile (as the Standard itself points out: "The keyword
volatile would then be redundant.").
At the other end of the spectrum, a "Black Box model" can be
inferred based on the statements in 5.1.2.3 p 5. Consider an
implementation that secretly maintains "shadow memory" for all
objects in a program execution. Regular memory addresses are
used for address-taking or index calculation, but any actual
memory accesses would access only the shadow memory (which is at
a different location), except for volatile-qualified accesses
which would load or store objects in the regular object memory
(ie, at the machine addresses produced by pointer arithmetic or
the & operator, etc). Only the implementation would know how to
turn a regular address into a "shadow" object access. Under the
Black Box model, volatile objects, and only volatile objects, are
usable in any useful way by any activity outside of or not under
control of the implementation.
At this point we might stop and say, well, let's just make a
conservative assumption that the implementation is following the
Black Box model, and that way we'll always be safe. The problem
with this assumption is that it's too conservative; no sensible
implementation would behave this way. Consider some of the
ramifications:
1. Couldn't use a debugger to examine variables (except
volatile variables);
2. Couldn't call an externally defined function written
in assembly or another language, unless the function
is declared with a prototype having volatile-qualified
parameters (and even that case isn't completely clear,
because of the rule at the end of 6.7.5.3 p 15 about
how functions types are compared and composited);
3. Couldn't call ordinary OS functions like read() and
write() unless the memory buffers were accessed
using volatile-qualified expressions.
These "impossible" conditions never happen because no
implementation is silly enough to take the Black Box model
literally. Technically, it would be allowed, but no one would
use it because it breaks too many deep assumptions about how a C
runtime interacts with its environment.
A more realistic model is one of many "Gray Box models" such
as the example implementation mentioned in 5.1.2.3 p 9:
Alternatively, an implementation might perform various
optimizations within each translation unit, such that the
actual semantics would agree with the abstract semantics
only when making function calls across translation unit
boundaries. In such an implementation, at the time of each
function entry and function return where the calling
function and the called function are in different
translation units, the values of all externally linked
objects and of all objects accessible via pointers therein
would agree with the abstract semantics. Furthermore, at
the time of each such function entry the values of the
parameters of the called function and of all objects
accessible via pointers therein would agree with the
abstract semantics. In this type of implementation, objects
referred to by interrupt service routines activated by the
signal function would require explicit specification of
volatile storage, as well as other implementation-defined
restrictions.
Here the implementation has made a design choice that makes
volatile superfluous in many cases. To get variable values to
store-synchronize, we need only call an appropriate function:
extern void okey_dokey( void );
extern int v;
...
v = 49; // storing into v is a "volatile" access
okey_dokey();
foo( v ); // this access is also "volatile"
Note that these "volatile" accesses work the way an actual
volatile access does because of an implementation choice about
calling functions defined in other translation units; obviously
that's implementation dependant.
Let's look at one more model, of interest because it comes up in
operating systems, which are especially prone to want to do
things that won't work without 'volatile'. In our hypothetical
kernel code, we access common blocks by surrounding the access
code with mutexes, which for simplicity are granted with spin
locks. Access code might look like this:
while( block_was_locked() ) { /*spin*/ }
// getting here means we have the lock
// access common block elements here
// ... and access some more
// ... and access some more
// ... and access some more
unlock_block();
Here it's understood that locking ('block_was_locked()') and
unlocking ('unlock_block()') will be done using volatile, but the
accesses inside the critical region of the mutex just use regular
variable access, since the block access code is protected by the
mutex.
If one is implementing a compiler to be used on operating system
kernels, this model (only partially described, but I think the
salient aspects are clear enough) is one worth considering. Of
course, the discussion here is very much simplified, there are
lots more considerations when designing actual operating system
locking mechanisms, but the basic scheme should be evident.
Looking at a broader perspective, is it safe to assume this model
holds in some unknown implementation(s) on our platforms of
choice? No, of course it isn't. The behavior of volatile is
implementation dependant. The model here is relevant because
many kernel developers unconsciously expect their assumptions
about locks and critical regions, etc., to be satisfied by using
volatile in this way. Any sensible implementation would be
foolish to ignore such assumptions, especially if kernel
developers were known to be in the target audience.
Returning to the original question, what answers can we give?
If you're an implementor, know that the Standard offers great
latitude in what volatile is required to do, but choosing any of
the extreme points is likely to be a losing strategy no matter what
your target audience is. Think about what other execution
regime(s) your target audience wants/needs to interact with;
choose an appropriate model that allows volatile to interact with
those regimes in a convenient way; document that model (as 6.7.3p6
requires for this implementation-defined aspect) and follow it
faithfully in producing code for volatile access. Remember that
you're implementing volatile to provide access to alternative
execution regimes, not just because the Standard requires it, and
it should work to provide that access, conveniently and without
undue mental contortions. Depending on the extent of the regimes
or the size of the target audience, several different models might
be given under different compiler options (if so it would help to
record which model is being followed in each object file, since the
different models are likely not to intermix in a constructive way).
If you're a developer, and are intent on being absolutely
portable across all implmentations, the only safe assumption is
the Black Box model, so just make every single variable and
object access be volatile-qualified, and you'll be safe. More
practically, however, a Gray Box model like one of the two
described above probably holds for the implementation(s) you're
using. Look for a description of what the safe assumptions are
in the implementations' documentation, and follow that; and, it
would be good to let the implementors know if a suitable
description isn't there or doesn't describe the requirements
adequately.