strlcpy and strlcat just transform the way in which a buffer overflow
can happen. They don't address the cause (human error w.r.t. length
calculations.) The way I avoid buffer overflows in strings is to use
a string ADT which doesn't takes memory length and string length into
account automatically with each operation:
http://bstring.sf.net/
This can bring the safety level essentially up to the same as is found
in other higher level languages in string operations.
I don't know of any specific resources, but here are some personal
guidelines in no particular order (note that I don't necessarily
follow these *all* the time, depending on the situation and how
deluded^H^H^H^H^H^H^Hconfident I am in my own abilities that day):
1. Initialize all variables to a known value.
Hmm ... well so for pointers do you initialize them to NULL? That's
fine, but its not much of a safety parachute if you have an accidental
"use before proper initialize" error.
2. Check all return values from library functions.
Well except for in bstrlib where its semantically optional. You can
usually just check dependent return values at the end and still know
an error has occurred without suffering from UB.
Better yet define gets() to emit an error or do something like stop
the program in its tracks.
4. During development, set the warning level on the compiler to its
highest setting. Review and eliminate each warning.
Right. The point is to recognize that even if you don't agree with a
warning, the effort you put into eliminating it is worth it for all
the other hints the compiler gives you through its warnings.
While not practical for everyone, these days I also try to ensure that
my code compiles with multiple compilers. I have found that different
compilers have vastly different safety coverage with their warnings --
complying with all of them helps make code truly bulletproof and
maintainable.
5. Don't cast an expression *just* to eliminate a warning.
I'm not sure what scenario you are talking about here. I would rather
say that you should cast *correctly*. I.e., clearly there are ways in
which casting numerics incorrectly can get the right type but the
wrong/inaccurate result.
6. When comparing against a constant expression for equality, put the
constant on the LHS (i.e., if (SOME_CONSTANT == x)); this will catch
any problems where you typed "=" when you meant "==".
Of course. People who have problems with this are somehow letting
some neurosis in their brain dominate over recognizing the law of
commutativity for this operator. The safety benefit for doing this is
obvious.
7. Abstract out tedious, repetitive, and/or low-level tasks. IOW,
don't call malloc() directly from your application code, but wrap it
in a function that performs error checking and initialization of the
memory being returned.
Well, the new/delete paradigm of Pascal or C++ is usually a lot safer
and readable than C's crazy mallocing. So for ADT's, I usually have
creation functions that I name with a "New" in them, and destruction
function names with a "Destroy" in them.
These are all reasonable ideas. But certainly I would add to them:
8. Use const *maximally*, and never cast away or work around const
semantics. Using const will typically make it obvious what parameters
to a function are inputs and what are outputs.
9. Always include error paths out of every function. (This goes with
2. above.) Without exception handling in C, your choices are either
to exit immediately (not recommended) or return with some kind of
erroneous return status. For ADTs that cannot be constructed, I
usually return NULL, and for just general errors, I return some
negative value under the assumption that normal operations always
return with 0 or a positive number. For debugging purposes -__LINE__
is a typical value that I return as an error.
10. Program for thread safety and reentrancy. strtok() is an example
of how *NOT* to design a function. Modifying what should obviously be
a source parameter, and then storing away the result in some single
focus, static way makes strtok non-reentrant. There is nothing in the
desired functionality of such a function that demands such bad
properties. Think of trying to do a simple thing like iterating
through substrings of a string in an outer loop, and then doing the
same on each substring in the inner loop -- strtok cannot be used in
something even as simple as that. If you are on linux look at
strtok_r() in the man pages for an example of a superior design which
has essentially the same functionality of strtok without its
weaknesses.
Ok, although ANSI C says nothing about multithreading, there is hardly
any modern implementation that does not expose platform specific, or
posix multithreading functionality. Statics, globals, and
side-effects are the kinds of things that work against race condition
safety, and so it pays to minimize them in *all* of your code.
11. If you have algorithms that only make sense for certain modes of
some parameters, try to implement them in functions with static
declaration. External interfaces should accept any combination and
modes of parameters so long as they are legal with respect to their
own type. The idea is that a developer should be able to read a .h
file read the function names, and already have a good idea of how to
use the module. Typically what prevents this is that usage of
functions have non-obvious parameter restrictions which requires that
developers read through documentation (which may or may not exist, may
be of poor quality, have errors in it, etc) to figure out what is
going on.
There is some controversy here though. *Personally* I insist on
*supporting* aliased parameters to the maximum degree possible.
However, I have basically seen almost no libraries that are
implemented with this in mind (gmp is an example of a library which
takes my point of view, for functionality reasons -- but you can see
how supporting aliasing can be very well motivated.) The assumption
of no aliasing is usually implicit or specifically required, even
though this is rarely enforced by "restrict" (which is not in
widespread use since C99 has not been adopted by any mainstream
compiler vendor.)
12. Avoid the C library for string manipulation. Use
http://bstrlib.sf.net/ or something in which memory and length
semantics are automatically managed.