Safer and Better C

P

Paul Hsieh

Chris Torek said:
While I agree that, in general, weaker constraints are "better"
for exposed interfaces than stronger ones, sometimes strong(ish)
constraints seem to make sense. An example we had earlier (though
I have no idea whether it was in this same thread) occurs with
strlen(NULL): while the C standards could require that strlen()
return 0 in this case, and perhaps that strcpy() do nothing if
either of its operands is NULL, and so on, NULL is not actually a
string, and claiming that it *is* a string of length 0 is clearly
not entirely correct either. I would not object to strlen(NULL)
returning 0, but I do not object to its being considered a
dreadful mistake either (as C works today). I find neither one
"clearly superior" to the other: there are tradeoffs either way.

Ok, but that's because you are assuming the limits of size_t as output
and are just hacking in this extra condition. You see? You haven't
*DESIGNED* the right answer, you are just seeing if hacking in this
NULL <-> "" equivalence would work and are evaluating it from there.

Look, what happens if you change the definition of strlen like this:

int strlen (const char * s);

Which returns -1 if s is NULL, otherwise the same as strlen. But, now
we've limited the size of our strings to INT_MAX in length, rather
than the maximum of size_t. So if we think this is acceptable or can
ignore this for a second, then we have the return value telling us
about a typical error condition or else giving us the length in one
shot. This allows the user to treat it as the strlen they are used
to, or otherwise having at least some way of putting a low level
sanity check in there. More sophisticated platform specific debug
libraries could put more work into determining if the pointer s is
really a valid readable memory location or not and pile onto the -1
error condition.

Then of course we could change strcpy() to return NULL, if one of the
parameters is NULL as another indication of error.

Now the getting rid of half your integer range thing might be a
difficult pill to swallow, so as an alternative you can return
UINT_MAX (or whatever size_t's maximum value is -- why the hell isn't
there a SIZE_T_MAX in limits.h?!?! -- anyone who continues to put
K&R&T or anyone of the C standards committee up on a pedistal needs a
serious lobotomy) as the error value (thus eliminating only one
possible length.) If this is unsatisfactory then we always have:

int strlen (size_t * sz, const char * s);

but then it can't be used in an arithmetic expression.

If you haven't seen the punchline coming from a mile away yet, I'll be
very disappointed. Go look at http://bstring.sf.net/ to see how I
dealt with this exact issue. Strings in the bstring library can only
legally have lengths between 0 and INT_MAX. So my blength() function
returns an integer and happily returns -1 if you pass in NULL, or
other easily detectable flawed input that doesn't otherwise lead to
UB. So I give up the possibility of using monster long strings beyond
the size of INT_MAX -- and I don't have to worry about closure issues,
because its not feasible to construct a bstring that is longer in the
same way that a malloc'ed char * could be. I don't even consider that
a trade off -- for all the functionality, safety and speed I obtain, I
don't miss the possibility of generating certain incredibly long
strings that I have never encountered in my life of programming so
far.
Languages with exceptions (Ada, C++, and Eiffel all come to mind)
can handle this by rejecting the attempt at runtime with an error;
and indeed, in Eiffel one can even express many constraints directly
in a function interface, so that compilers can catch some of these
errors at compile time. Once one buys into the "exceptions" model,
it becomes clear what to do with constraint violations: instead
of allowing all possible inputs (and generating "the least garbagey
possible output" for garbage input), just throw an exception.

Dude -- you know the only reason C++ has exceptions is because there
is no program control for constructors or destructors (and therefore
nowhere for them to return an error code.) Anyhow, yes other
languages have other ways of obtaining closure. But now tell me who's
being off topic in c.l.c? So can I go back to posting how I think C
should be able to assume 2s complement arithmetic, have a widening
multiply, bit-scan, coroutines, and a seriously programmatically
enabled preprocessor?

Anyhow, if you view putting in code to handle NULL pointers as being
"garbage moderation" then I think you've missed the point. As you
point out, C doesn't have exception handling. So the next best thing
you've got is to just return error codes out of everything that might
screw up.

And you should be viewing this as a means of obtaining operational
closure, not just some interesting feature-add. Look, malloc can
return NULL, so pointers can be filled will NULL instead of pointing
to well formed blocks of memory. If you claim that you should always
test malloc for returning NULL and deal with failure cases, then why
not just stick with the typical malloc wrapper than exits the program
whenever NULL is returned? The easy answer: malloc() failing is not
necessarily fatal to the rest of your application. This is of
particular importance to Bstrlib which tries to allocate memory in
powers of 2; if it finds it cannot, then instead of failing
immediately, it tries to malloc the tightest possible fiting boundary
instead.

Just as malloc doesn't, similarly bstrlib doesn't set policy about
failed bstring construction -- if it fails (because there is no
memory) then it doesn't try to exit your program. It just returns
NULL then lets the programmer notice this or not. Then by having
closure (i.e., allowing parameters to be NULL, but returning an error
in such cases, so that the error continues to propogate) it gives the
programmer a myriad of options for dealing with such cases.

And of course to complete the picture, I also support parameter
aliasing, for a fairly generous definition of what aliasing could
legally mean.

This isn't just a "nice to have" -- its a powerful concept that takes
less code than you think and has scarcely any performance impact. The
reason why proponents of other languages poo poo C programmers is
because we don't have built-in safety nets in our code that keep us
from killing ourselves on UB at every turn. But my claim is that,
that might be more a matter of culture than what the real technical
limitations of C is.

People don't try to write such safe code in C because they think C is
supposed to be lean and mean and that doing so will bloat their code
or something. The minimal Bstrlib module is like 8K of footprint and
the C library has no chance of matching up to it in terms of
performance. People are just wrong. You can make libraries or just
code in general in C that are safe, fast, and as functional as any
higher level language.

You have to stop letting the short sightedness of Ritchie, Kernigan,
Thompson and the C standards committee keeping you from seeing C as a
language from which you can design sophisticated structures. Bstrlib
is not just some weird cases where everything seems to work out. Last
month I posted some abstracted linked list code which (while probably
not very fast) has a lot of the same safety, power, and simplicity of
Bstrlib. C is not just a fancy assembly language. At least I try not
to treat it so simply.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,148
Messages
2,570,838
Members
47,385
Latest member
Joneswilliam01

Latest Threads

Top