I've been utilising C for lots of small and a few medium-sized personal
projects over the course of the past decade, and I've realised lately
just how little progress it's made since then. I've increasingly been
using scripting languages (especially Python and Bourne shell) which
offer the same speed and yet are far more simple and safe to use. I can
no longer understand why anyone would willingly use C to program
anything but the lowest of the low-level stuff. Even system utilities,
text editors and the like could be trivially written with no loss of
functionality or efficiency in Python. Anyway, here's my reasons. I'd
be interested to hear some intelligent advantages (not
rationalisations) for using C.
No string type
--------------
C has no string type. Huh? Most sane programming languages have a
string type which allows one to just say "this is a string" and let the
compiler take care of the rest. Not so with C. It's so stubborn and
dumb that it only has three types of variable; everything is either a
number, a bigger number, a pointer or a combination of those three.
Thus, we don't have proper strings but "arrays of unsigned integers".
"char" is basically just a really small number. And now we have to
start using unsigned ints to represent multibyte characters.
What. A. Crock. An ugly hack.
Functions for insignificant operations
--------------------------------------
Copying one string from another requires including <string.h> in your
source code, and there are two functions for copying a string. One
could even conceivably copy strings using other functions (if one
wanted to, though I can't imagine why). Why does any normal language
need two functions just for copying a string? Why can't we use the
assignment operator ('=') like for the other types? Oh, I forgot.
There's no such thing as strings in C; just a big continuous stick of
memory. Great! Better still, there's no syntax for:
* string concatenation
* string comparison
* substrings
Ditto for converting numbers to strings, or vice versa. You have to use
something like atol(), or strtod(), or a variant on printf(). Three
families of functions for variable type conversion. Hello? Flexible
casting? Hello?
And don't even get me started on the lack of an exponentiation
operator.
No string type: the redux
-------------------------
Because there's no real string type, we have two options: arrays or
pointers. Array sizes can only be constants. This means we run the risk
of buffer overflow since we have to try (in vain) to guess in advance
how many characters we need. Pathetic. The only alternative is to use
malloc(), which is just filled with pitfalls. The whole concept of
pointers is an accident waiting to happen. You can't free the same
pointer twice. You have to always check the return value of malloc()
and you mustn't cast it. There's no builtin way of telling if a spot of
memory is in use, or if a pointer's been freed, and so on and so forth.
Having to resort to low-level memory operations just to be able to
store a line of text is asking for...
The encouragement of buffer overflows
-------------------------------------
Buffer overflows abound in virtually any substantial piece of C code.
This is caused by programmers accidentally putting too much data in one
space or leaving a pointer pointing somewhere because a returning
function ballsed up somewhere along the line. C includes no way of
telling when the end of an array or allocated block of memory is
overrun. The only way of telling is to run, test, and wait for a
segfault. Or a spectacular crash. Or a slow, steady leakage of memory
from a program, agonisingly 'bleeding' it to death.
Functions which encourage buffer overflows
------------------------------------------
* gets()
* strcat()
* strcpy()
* sprintf()
* vsprintf()
* bcopy()
* scanf()
* fscanf()
* sscanf()
* getwd()
* getopt()
* realpath()
* getpass()
The list goes on and on and on. Need I say more? Well, yes I do.
You see, even if you're not writing any memory you can still access
memory you're not supposed to. C can't be bothered to keep track of the
ends of strings; the end of a string is indicated by a null '\0'
character. All fine, right? Well, some functions in your C library,
such as strlen(), perhaps, will just run off the end of a 'string' if
it doesn't have a null in it. What if you're using a binary string?
Careless programming this may be, but we all make mistakes and so the
language authors have to take some responsibility for being so
intolerant.
No builtin boolean type
-----------------------
If you don't believe me, just watch:
$ cat > test.c
int main(void)
{
bool b;
return 0;
}
$ gcc -ansi -pedantic -Wall -W test.c
test.c: In function 'main':
test.c:3: 'bool' undeclared (first use in this function)
Not until the 1999 ISO C standard were we finally able to use 'bool' as
a data type. But guess what? It's implemented as a macro and one
actually has to include a header file to be able to use it!
High-level or low-level?
------------------------
On the one hand, we have the fact that there is no string type and
little automatic memory management, implying a low-level language. On
the other hand, we have a mass of library functions, a preprocessor and
a plethora of other things which imply a high-level language. C tries
to be both, and as a result spreads itself too thinly.
The great thing about this is that when C is lacking a genuinely useful
feature, such as reasonably strong data typing, the excuse "C's a
low-level language" can always be used, functioning as a perfect
'reason' for C to remain unhelpfully and fatally sparse.
The original intention for C was for it to be a portable assembly
language for writing UNIX. Unfortunately, from its very inception C has
had extra things packed into it which make it fail as an assembly
language. Its kludgy strings are a good example. If it were at least
portable these failings might be forgivable, but C is not portable.
Integer overflow without warning
--------------------------------
Self explanatory. One minute you have a fifteen digit number, then try
to double or triple it and - boom - its value is suddenly
-234891234890892 or something similar. Stupid, stupid, stupid. How hard
would it have been to give a warning or overflow error or even just
reset the variable to zero?
This is widely known as bad practice. Most competent developers
acknowledge that silently ignoring an error is a bad attitude to have;
this is especially true for such a commonly used language as C.
Portability?!
-------------
Please. There are at least four official specifications of C I could
name from the top of my head and no compiler has properly implemented
all of them. They conflict, and they grow and grow. The problem isn't
subsiding; it's increasing each day. New compilers and libraries are
developed and proprietary extensions are being developed. GNU C isn't
the same as ANSI C isn't the same as K&R C isn't the same as Microsoft
C isn't the same as POSIX C. C isn't portable; all kinds of machine
architectures are totally different, and C can't properly adapt because
it's so muttonheaded. It's trapped in The Unix Paradigm.
If it weren't for the C preprocessor, then it would be virtually
impossible to get C to run on multiple families of processor hardware,
or even just slightly differing operating systems. A programming
language should not require a C preprocessor so that it can run on both
FreeBSD, Linux or Windows without failing to compile.
C is unable to adapt to new conditions for the sake of "backward
compatibility", throwing away the opportunity to get rid of stupid,
utterly useless and downright dangerous functions for a nonexistent
goal. And yet C is growing new tentacles and unnecessary features
because of idiots who think adding seven new functions to their C
library will make life easier. It does not.
Even the C89 and C99 standards conflict with each other in ridiculous
ways. Can you use the long long type or can't you? Is a certain
constant defined by a preprocessor macro hidden deep, deep inside my C
library? Is using a function in this particular way going to be
undefined, or acceptable? What do you mean, getch() isn't a proper
function but getc() and getchar() are?
The implications of this false 'portability'
--------------------------------------------
Because C pretends to be portable, even professional C programmers can
be caught out by hardware and an unforgiving programming language;
almost anything like comparisons, character assignments, arithmetic, or
string output can blow up spectacularly for no apparent reason because
of endianness or because your particular processor treats all chars as
unsigned or silly, subtle, deadly traps like that.
Archaic, unexplained conventions
--------------------------------
In addition to the aforementioned problems, C also has various
idiosyncracies (invariably unreported) which not even some teachers of
C are aware of:
* "Don't use fflush(stdin)."
* "gets() is evil."
* "main() must return an integer."
* "main() can only take one of three sets of arguments."
* "main() can only return either EXIT_SUCCESS or EXIT_FAILURE."
* "You musn't cast the return value of malloc()."
* "fileno() isn't an ANSI compliant function."
* "A preprocessor macro oughtn't use any of its arguments more than
once."
...all these unnecessary and unmentioned quirks mean buggy code. Death
by a thousand cuts. Ironic when you consider that Kernighan thinks of
Pascal in the same way when C has just as many little gotchas that
bleed you to death gradually and painfully.
Blaming The Progammer
---------------------
Due to the fact that C is pretty difficult to learn and even harder to
actually use without breaking something in a subtle yet horrific way
it's assumed that anything which goes wrong is the programmer's fault.
If your program segfaults, it's your fault. If it crashes, mysteriously
returning 184 with no error message, it's your fault. When one single
condition you'd just happened to have forgotten about whilst coding
screws up, it's your fault.
Obviously the programmer has to shoulder most of the responsibility for
a broken program. But as we've already seen, C positively tries to make
the programmer fail. This increases the failure rate and yet for some
reason we don't blame the language when yet another buffer overflow is
discovered. C programmers try to cover up C's inconsistencies and
inadequacies by creating a culture of 'tua culpa'; if something's
wrong, it's your fault, not that of the compiler, linker, assembler,
specification, documentation, or hardware.
Compilers have to take some of the blame. Two reasons. The first is
that most compilers have proprietary extensions built into them. Let me
remind you that half of the point of using C is that it should be
portable and compile anywhere. Adding extensions violates the original
spirit of C and removes one of its advantages (albeit an already
diminished advantage).
The other (and perhaps more pressing) reason is the lack of anything
beyond minimal error checking which C compilers do. For every ten types
of errors your compiler catches, another fifty will slip through.
Beyond variable type and syntax checking the compiler does not look for
anything else. All it can do is give warnings on unusual behaviour,
though these warnings are often spurious. On the other hand, a single
error can cause a ridiculous cascade, or make the compiler fall over
and die because of a misplaced semicolon, or, more accurately and
incriminatingly, a badly constructed parser and grammar. And yet,
despite this, it's your fault.
To quote The Unix Haters' Handbook:
"If you make even a small omission, like a single semicolon, a C
compiler tends to get so confused and annoyed that it bursts into tears
and complains that it just can't compile the rest of the file since one
missing semicolon has thrown it off so much."
So C compilers may well give literally hundreds of errors stating that
half of your code is wrong if you miss out a single semicolon. Can it
get worse? Of course it can! This is C!
You see, a compiler will often not deluge you with error information
when compiling. Sometimes it will give you no warning whatsoever even
if you write totally foolish code like this:
#include <stdio.h>
int main()
{
char *p;
puts(p);
return 0;
}
When we compile this with our 'trusty' compiler gcc, we get no errors
or warnings at all. Even when using the '-W' and '-Wall' flags to make
it watch out for dangerous code it says nothing.
$ gcc -W -Wall stupid.c
$
In fact, no warning is given ever unless you try to optimise the
program with a '-O' flag. But what if you never optimise your program?
Well, you now have a dangerous program. And unless you check the code
again you may well never notice that error.
What this section (and entire document) is really about is the sheer
unfriendliness of C and how it is as if it takes great pains to be as
difficult to use as possible. It is flexible in the wrong way; it can
do many, many different things, but this makes it impossible to do any
single thing with it.
Trapped in the 1970s
--------------------
C is over thirty years old, and it shows. It lacks features that modern
languages have such as exception handling, many useful data types,
function overloading, optional function arguments and garbage
collection. This is hardly surprising considering that it was
constructed from an assembler language with just one data type on a
computer from 1970.
C was designed for the computer and programmer of the 1970s,
sacrificing stability and programmer time for the sake of memory.
Despite the fact that the most recent standard is just half a decade
old, C has not been updated to take advantage of increased memory and
processor power to implement such things as automatic memory
management. What for? The illusion of backward compatibility and
portability.
Yet more missing data types
---------------------------
Hash tables. Why was this so difficult to implement? C is intended for
the programming of things like kernels and system utilities, which
frequently use hash tables. And yet it didn't occur to C's creators
that maybe including hash tables as a type of array might be a good
idea when writing UNIX? Perl has them. PHP has them. With C you have to
fake hash tables, and even then it doesn't really work at all.
Multidimensional arrays. Before you tell me that you can do stuff like
int multiarray[50][50][50] I think that I should point out that that's
an array of arrays of arrays. Different thing. Especially when you
consider that you can also use it as a bunch of pointers. C programmers
call this "flexibility". Others call it "redundancy", or, more
accurately, "mess".
Complex numbers. They may be in C99, but how many compilers support
that? It's not exactly difficult to get your head round the concept of
complex numbers, so why weren't they included in the first place? Were
complex numbers not discovered back in 1989?
Binary strings. It wouldn't have been that hard just to make a
compulsory struct with a mere two members: a char * for the string of
bytes and a size_t for the length of the string. Binary strings have
always been around on Unix, so why wasn't C more accommodating?
Library size
------------
The actual core of C is admirably small, even if some of the syntax
isn't the most efficient or readable (case in point: the combined '? :'
statement). One thing that is bloated is the C library. The number of
functions in a full C library which complies with all significant
standards runs into four digit figures. There's a great deal of
redundancy, and code which really shouldn't be there.
This has knock-on effects, such as the large number of configuration
constants which are defined by the preprocessor (which shouldn't be
necessary), the size of libraries (the GNU C library almost fills a
floppy disk and its documentation, three) and inconsistently named
groups of functions in addition to duplication.
For example, a function for converting a string to a long integer is
atol(). One can also use strtol() for exactly the same thing. Boom -
instant redundancy. Worse still, both functions are included in the
C99, POSIX and SUSv3 standards!
Can it get worse? Of course it can! This is C!
As a result it's only logical that there's an equivalent pair of atod()
and strtod() functions for converting a string to a double. As you've
probably guessed, this isn't true. They are called atof() and strtod().
This is very foolish. There are yet more examples scattered through the
standard C library like a dog's smelly surprises in a park.
The Single Unix Specification version three specifies 1,123 functions
which must be available to the C programmer of the compliant system. We
already know about the redundancies and unnecessary functions, but
across how many header files are these 1,123 functions spread out? 62.
That's right, on average a C library header will define approximately
eighteen functions. Even if you only need to use maybe one function
from each of, say, five libraries (a common occurrence) you may well
wind up including 90, 100 or even 150 function definitions you will
never need. Bloat, bloat, bloat. Python has the right idea; its import
statement allows you to define exactly the functions (and global
variables!) you need from each library if you prefer. But C? Oh, no.
Specifying structure members
----------------------------
Why does this need two operators? Why do I have to pick between '.' and
'->' for a ridiculous, arbitrary reason? Oh, I forgot; it's just yet
another of C's gotchas.
Limited syntax
--------------
A couple of examples should illustrate what I mean quite nicely. If
you've ever programmed in PHP for a substantial period of time, you're
probably aware of the 'break' keyword. You can use it to break out from
nested loops of arbitrary depth by using an integer, like so:
for ($i = 0; $i < 10; $i++) {
for ($j = 0; $j < 10; $j++) {
for ($k = 0; $k < 10; $k++) {
break 2;
}
}
/* breaks out to here */
}
There is no way of doing this in C. If you want to break out from a
series of nested for or while loops then you have to use a goto. This
is what is known as a crude hack.
In addition to this, there is no way to compare any non-numerical data
type using a switch statement. Not even strings. In the programming
language D, one can do:
char s[];
switch (s) {
case "hello":
/* something */
break;
case "goodbye":
/* something else */
break;
case "maybe":
/* another action */
break;
default:
/* something */
break;
}
C does not allow you to use switch and case statements for strings. One
must use several variables to iterate through an array of case strings
and compare them to the given string with strcmp(). This reduces
performance and is just yet another hack.
In fact, this is an example of gratuitous library functions running
wild once again. Even comparing one string to another requires use of
the strcmp() function:
char string[] = "Blah, blah, blah\n";
if (strcmp(string, "something") == 0) {
/* do something */
}
Flushing standard I/O
---------------------
A simple microcosm of the "you can do this, but not that" philosophy of
C; one has to do two different things to flush standard input and
standard output.
To flush the standard output stream, the fflush() function is used
(defined by <stdio.h>). One doesn't usually need to do this after every
bit of text is printed, but it's nice to know it's there, right?
Unfortunately, fflush() can't be used to flush the contents of standard
input. Some C standards explicitly define it as having undefined
behaviour, but this is so illogical that even textbook authors
sometimes mistakenly use fflush(stdin) in examples and some compilers
won't bother to warn you about it. One shouldn't even have to flush
standard input; you ask for a character with getchar(), and the program
should just read in the first character given and disregard the rest.
But I digress...
There is no 'real' way to flush standard input up to, say, the end of a
line. Instead one has to use a kludge like so:
int c;
do {
errno = 0;
c = getchar();
if (errno) {
fprintf(stderr,
"Error flushing standard input buffer: %s\n",
strerror(errno));
}
} while ((c != '\n') && (!feof(stdin)));
That's right; you need to use a variable, a looping construct, two
library functions and several lines of exception handling code to flush
the standard
input buffer.
Inconsistent error handling
---------------------------
A seasoned C programmer will be able to tell what I'm talking about
just by reading the title of this section. There are many incompatible
ways in which a C library function indicates that an error has
occurred:
* Returning zero.
* Returning nonzero.
* Returning EOF.
* Returning a NULL pointer.
* Setting errno.
* Requiring a call to another function.
* Outputting a diagnostic message to the user.
* Triggering an assertion failure.
* Crashing.
Some functions may actually use up to three of these methods. (For
instance, fread().) But the thing is that none of these are compatible
with each other and error handling does not occur automatically; every
time a C programmer uses a library function they must check manually
for an error. This bloats code which would otherwise be perfectly
readable without if-blocks for error handling and variables to keep
track of errors. In a large software project one must write a section
of code for error handling hundreds of times. If you forget, something
can go horribly wrong. For example, if you don't check the return value
of malloc() you may accidentally try to use a null pointer. Oops...
Commutative array subscripting
------------------------------
"Hey, Thompson, how can I make C's syntax even more obfuscated and
difficult to understand?"
"How about you allow 5[var] to mean the same as var[5]?"
"Wow; unnecessary and confusing syntactic idiocy! Thanks!"
"You're welcome, Dennis."
Yes, I understand that array subscription is just a form of addition
and so it should be commutative, but doesn't it seem just a bit foolish
to say that 5[var] is the same as var[5]? How on earth do you take the
var'th value of 5?
Variadic anonymous macros
-------------------------
In case you don't understand what variadic anonymous macros are,
they're macros (i.e. pseudofunctions defined by the preprocessor) which
can take a variable number of arguments. Sounds like a simple thing to
implement. I mean, it's all done by the preprocessor, right? And
besides, you can define proper functions with variable numbers of
arguments even in the original K&R C, right?
In that case, why can't I do:
#define error(...) fprintf(stderr, ...)
without getting a warning from GCC?
warning: anonymous variadic macros were introduced in C99
That's right, folks. Not until late 1999, 30 years after development on
the C programming language began, have we been allowed to do such a
simple task with the preprocessor.
The C standards don't make sense
--------------------------------
Only one simple quote from the ANSI C standard - nay, a single footnote
- is needed to demonstrate the immense idiocy of the whole thing.
Ladies, gentlemen, and everyone else, I present to you...footnote 82:
All whitespace is equivalent except in certain situations.
I'd make a cutting remark about this, but it'd be too easy.
Too much preprocessor power
---------------------------
Rather foolishly, half of the actual C language is reimplemented in the
preprocessor. (This should be a concern from the start; redundancy
usually indicates an underlying problem.) We can #define fake
variables, fake conditions with #ifdef and #ifndef, and look, there's
even #if, #endif and the rest of the crew! How useful!
Erm, sorry, no.
Preprocessors are a good idea for a language like C. As has been
iterated, C is not portable. Preprocessors are vital to bridging the
gap between different computer architectures and libraries and allowing
a program to compile on multiple machines without having to rely on
external programs. The #define statement, in this case, can be used
perfectly validly to set 'flags' that can be used by a program to
determine all sorts of things: which C standard is being used, which
library, who wrote it, and so on and so forth.
Now, the situation isn't as bad as for C++. In C++, the preprocessor is
so packed with unnecessary rubbish that one can actually use it to
calculate an arbitrary series of Fibonacci numbers at compile-time.
However, C comes dangerously close; it allows the programmer to define
fake global variables with wacky values which would not otherwise be
proper code, and then compare values of these variables. Why? It's not
needed; the C language of the Plan 9 operating system doesn't let you
play around with preprocessor definitions like this. It's all just
bloat.
"But what about when we want to use a constant throughout a program? We
don't want to have to go through the program changing the value each
time we want to change the constant!" some may complain. Well, there's
these things called global variables. And there's this keyword, const.
It makes a constant variable. Do you see where I'm going with this?
You can do search and replace without the preprocessor, too. In fact,
they were able to do it back in the seventies on the very first
versions of Unix. They called it sed. Need something more like cpp? Use
m4 and stop complaining. It's the Unix way.