It's pretty easy to come up with examples of functions that are
trivial if you don't check the arguments and nontrivial if you
do. An excerpt from a library of mine is below. Each of these
functions is only one or two machine instructions when inlined.
If the functions were modified to check for null pointers and for
invalid pointers, then they would at least double in code size
(making them less suitable as inline functions) and presumably in
execution time also.
Elsewhere in this thread, Jacob Navia suggested using 64-bit
magic numbers to mark memory regions of a particular type. That
would also increase the size of this data structure by 40% with
the C implementation that I most commonly use. Perhaps not
fatal, but definitely significant.
/* A node in the range set. */
struct range_set_node
{
struct bt_node bt_node; /* Binarytree node. */
unsigned long int start; /* Start of region. */
unsigned long int end; /* One past end of region. */
};
/* Returns the position of the first 1-bit in NODE. */
static inline unsigned long int
range_set_node_get_start (const struct range_set_node *node)
{
return node->start;
}
/* Returns one past the position of the last 1-bit in NODE. */
static inline unsigned long int
range_set_node_get_end (const struct range_set_node *node)
{
return node->end;
}
/* Returns the number of contiguous 1-bits in NODE. */
static inline unsigned long int
range_set_node_get_width (const struct range_set_node *node)
{
return node->end - node->start;
}
The only pointer check I consider useful at the library level is
whether NULL pointers are semantically valid as arguments to function
parameters. In your functions defined above, passing a NULL pointer
in as 'node' will introduce a memory access violation. One could
consider inserting a check at the module (library) level to verify
that the pointer argument is semantically valid.
\code
/* Returns the position of the first 1-bit in NODE. */
static inline unsigned long int
range_set_node_get_start (const struct range_set_node *node)
{
check_pointer_semantics (node != NULL);
return node->start;
}
\endcode
Unfortunately, when one wants to insert checking at the library level,
they enforce behavior on the client that is not satisfactory in all
scenarios. In development and debugging, 'check_pointer_semantics'
would prefer an assert like function. In production, crashing the
application due to passing a NULL pointer argument in
'range_set_node_get_start' may not be the preferred option. It might
be nice to save the current program's progress if it's a long
algorithm, or perhaps inform the user in a nicer fashion.
The best option I can come up with is to try to abstract the outcome
of a precondition violation. The check of a precondition can be split
into three phases: detection, reporting, and response. The detection
is evaluation of the precondition expression 'node != NULL'. The
reporting step may be a display of the file and line of the error like
'Precondition violation: node != NULL, file lib_source.c, line 123'.
The response can be to abort like 'assert', or to early-return with a
error value, or invoke a global procedure to save and shutdown, or pop
a dialog with an option to report the error. The standard library
'assert' can be represented as a combination of the three phases,
where the report is to print file and line information, and the
response is to 'abort'. The advantage is that when the constraint is
enforced at the library level, every call to
'range_set_node_get_start' is validated irregardless of the knowledge
and competency of the developers calling it.
The key to acceptance is providing an architecture that allows the
client (the application developer using the module) to sculpt the
report and response to these precondition violations in a manner that
is convenient. To help satisfy these requirements, I use macros that
reference a global function table that contains function pointers to
'report' and 'response' functions.
\code snippet
static struct c_constraint_violation_ftable
gc_private_constraint_violation_ftable =
{
gc_default_report_handler,
NULL
};
void c_constraint_violation_set_report_handler( void (*report_fn)
( const char*, const char*, int ) );
void c_constraint_violation_set_response_handler( void (*response_fn)
( void ) );
\endcode
The report handler 'gc_default_report_handler' would look like the
following.
\code snippet
void gc_default_report_handler( const char* expr, const char* file,
int line )
{
fprintf( stderr, "%s, file %s, line %d\n", expr, file, line );
fflush( stderr );
}
\endcode
The response and report handlers are called from a generic constraint
violation handler.
\code
void gc_constraint_violation( const char* expr, const char* file, int
line )
{
if ( gc_private_constraint_violation_ftable.report ) {
(*gc_private_constraint_violation_ftable.report)( expr, file,
line );
}
if ( gc_private_constraint_violation_ftable.response ) {
(*gc_private_constraint_violation_ftable.response)();
}
}
\endcode
Finally, I have a macros to evaluate the constraint and add any
additional response that cannot be encapsulated in the 'response'
function pointer.
\code snippet
#define c_return_value_if_fail( expr, val ) \
do \
{ \
if ( expr ) {} \
else \
{ \
g_constraint_violation( "Constraint violation: " #expr, \
__FILE__, __LINE__ ); \
return (val); \
} \
} while (0)
\endcode
There is also a 'c_return_if_fail' that replace the 'return (val);'
with 'return;' statement, and a c_assert_if_fail that doesn't have any
return so it acts as a pass through for cases when an 'early-exit'
response isn't required (an example would be putting a constraint that
informs the user when strlcat truncates its result, as a truncation is
likely a serious error, but not fatal to the program since truncation
is defined as valid semantics for the function).
Just as 'assert' is disabled by NDEBUG, one can define a preprocessor
symbol to disable evaluation of these macros at library level, or at
source level with a granularity that depends on how many source files
one uses to implement a module. If 'range_set_node_get_start',
'range_set_node_get_end', and 'range_set_node_get_width' are all
defined in different source .c files, one can use the build system to
control which functions get compiled with or without constraint
checking (in my case, by defining C_NO_CONSTRAINTS). It's even
possible to do it at run-time if the 'report' or 'response' function
pointer check a global status value.
The result is that more of the library's assumptions can be enforced
within the framework of a single architecture with more flexibility
offered that by using 'assert' or manual defensive programming
techniques. If one wants to use this system in a design-by-contract
style, simply wire the 'response' function pointer to 'abort', which
ensures that a 'c_return_value_if_fail' will stop in its tracks. Or
one can place a breakpoint in an empty response function pointer to
stop at each constraint violation while still maintaining the
defensive style of 'c_return_value_if_fail' for production
environments.
For example, here is my version of strlcat that I use, that validates
that the source and destination strings reference memory, and warns if
the truncation occurs.
size_t c_strlcat( char* dst, const char* src, size_t dst_size )
{
size_t dst_length;
char* d = dst;
const char* s = src;
size_t n = dst_size;
size_t dlen;
c_return_value_if_fail( dst != NULL, 0 )
c_return_value_if_fail( src != NULL, 0 );
/* Find the end of dst and adjust bytes left but don't go past end.
*/
while ( *d != '\0' && n-- != 0 ) {
++d;
}
dlen = d - dst;
n = dst_size - dlen;
if ( n == 0 ) {
return dlen + strlen( s );
}
while ( *s != '\0' )
{
if ( n != 1 )
{
*d++ = *s;
--n;
}
++s;
}
*d = '\0';
/* count does not include NUL character */
dst_length = dlen + (s - src);
c_assert_if_fail( dst_length < dst_size );
return dst_length;
}
\endcode
If I get a constraint violation, I get a message like the following.
\result
Constraint violation: dst_length < dst_size, file strops.c, line 326
\end result
If one wanted to annotate these constraints, I simply add a '&&
"constraint annotation"' to the expression.
\code snippet
c_assert_if_fail( dst_length < dst_size && "buffer truncation" );
\endcode
The point is that it's possible to create an architecture that
enforces many of the module's constraints in a *consistent* manner
that is useful (since as the library writer, you're the one with the
best knowledge and the most control). Even if it doesn't satisfy all
the people, all the time, one can still write wrappers around these
functions and hard-define C_NO_CONSTRAINTS to reduce the error
checking to nothing which allows people the freedom to completely
customize their error handling behavior absent of any constraint
checking the library provides. In the ideal case, it should reduce
the support needed to help the general user-base interface with the
module.
So for your functions, I would have no problem writing them like the
following (the value of '0' in the examples could be a named constant
if more applicable).
\code snippet
/* A node in the range set. */
struct range_set_node
{
struct bt_node bt_node; /* Binary tree node. */
unsigned long int start; /* Start of region. */
unsigned long int end; /* One past end of region. */
};
/* Returns the position of the first 1-bit in NODE. */
static inline unsigned long int
range_set_node_get_start (const struct range_set_node *node)
{
c_return_value_if_fail( node != NULL, 0 );
return node->start;
}
/* Returns one past the position of the last 1-bit in NODE. */
static inline unsigned long int
range_set_node_get_end (const struct range_set_node *node)
{
c_return_value_if_fail( node != NULL, 0 );
return node->end;
}
/* Returns the number of contiguous 1-bits in NODE. */
static inline unsigned long int
range_set_node_get_width (const struct range_set_node *node)
{
c_return_value_if_fail( node != NULL, 0 );
return node->end - node->start;
}
\endcode
Even though there may be some conflict where the value '0' is a valid
semantic value, the report portion of a constraint violation is often
enough to trigger the developer to quickly fix the error and move on,
or at least recognize that something wrong may be going on and the
result may be invalid. It's more informative to the developer and the
user than just crashing the program. And if one wants blistering
speed, define C_NO_CONSTRAINTS to disable constraint checking and
throw caution to the wind. It's also better than pure defensive
programming, especially scenarios where an error return value shares
the same domain as valid return values. For example, one can return 0
as a result of a NULL pointer constraint violation detected in
strlcat, but it doesn't distinguish it from a valid value of 0 when
copying an empty string "".
It's important to recognize that these macros often can just verify a
partial view of a module's preconditions. That's why it's important
to document explicit preconditions (depending on the formal-ness of
the code) as well as any constraints one can verify in code. Even if
one cannot validate all properties of the preconditions of a module,
checking a subset of those conditions can still be very powerful.
Most of the time, I place constraint checking on all arguments when
applicable to public module functions, but not on private functions.
But, the more complicated a private function is, the more likely I
will add constraints to it (typically of the 'c_assert_if_fail'
variety).
I've been "simulated annealing" on this for quite a while, but I
recognize that there is no one best solution. And I'm sure there are
rocks that one could throw at this setup (like the lack of __func__ to
allow function names; it's still in the annealing phase). All I can
say is that from my personal experience, this scheme's value exceeds
that of 'assert' and the GLib g_return[_val]_if_fail macros. It
allows one to switch quite easily from the defensive programming style
to the contract programming style with a minimal change in its
'response' function pointer. It's as close to the best of both worlds
that I have come up with.
In Navia's case, for any pointer validation beyond semantic checking
for NULL, I would highly recommend not placing it within any library
API other than an allocator that centers on creating "heavy
pointers". If one wants to decorate pointers to dynamically allocated
memory blocks with its allocated size to track how much allocated
memory the application currently uses, or fence checking to place
magic numbers to try to detect memory corruption, determine ALIGN_MAX
to properly align and stack the decorations on the pointer and use
custom allocator routines. Or rely on a memory debugger framework
like valgrind, dmalloc, ... life's too short to reinvent this wheel.
Best regards,
John D.