J
jacob navia
Why C?
With the development of the c++ language, c was destroyed. Destroyed in
the sense that all language development ceased, and "naturally" all
developpers changed horses to the "new and improved" c++, leaving c as a
bad souvenir or at best a curiosity to be used in embedded systems or
similar environments where c++ doesn't cut it.
Many c++ books (specially some older ones) start with a chapter telling
people how bad C is, and why c++ solves all the problems of c. Of course
this is not very difficult to do since c remained as it was in 1989,
without any change.
Still, I think that c can be a very good language precisely because of
its simplicity. It is the only high level language where there isn't any
"object oriented" framework that has been implemented in most current
languages, from java to cobol.
What are the main issues with C?
(1) The obsolete c library.
===========================
The c library is part of the language and it was one of the first
standardized, portable implementation of a language across a number of
computers.
In the 21st century however, it shows its age.
Its main problems are:
1.1: No support for standard containers like lists or hash tables
-----------------------------------------------------------------
A lack of a standard way to implement containers like lists,
flexible arrays, hash tables, and many others is one of the worst
features of the languages. This state of affairs means that each c
programmer must implement again and again the same routines, and start
each time with the debugging of the new code. It means also that it is
not possible to share code since each routine will use a slightly
different interface.
When it was proposed in the discussion group comp.std.c that the
standards committee standardizes a common interface, the "speaker" of
the committee, Mr Gwyn answered:
(Msg ID:<[email protected]> thread "C needs a BOOST",
10/4/2007)
--------------------------------<quote>
The very age of C might be partly responsible, in that the vast
amount of existing C applications already embed some solutions
to the requirements for lists, etc. The maintenance programmer
(a)is unlikely to rework the existing app just to use some new
standardized interface for the same thing; and (b) has to
continue to maintain whatever libraries he has been using.
The only real use for such a library would be for new program
development, once the learning hurdle has been overcome. Much
new development really ought to use higher-level languages in
the first place.
------------------------------<end quote>
The argument of the committee speaker is then:
(A) Old programs have already lists and container code. They
do not need any new specification.
(B) New programs should not be written in c but in "higher-level"
languages.
This is the attitude of many people in the standards committee and in
the c/c++ programming community in general. C is dead and the best thing
to do is to maintain it in that situation.
I disagree profoundly with this attitude.
1.2: A completely obsolete representation of strings.
----------------------------------------------------
The "C strings" are represented by a sequences of characters in memory
followed by a zero byte. Since they have to be maintained manually (it
is the responsability of the programmer to maintain that terminating
zero) it offers ample opportunities for errors. This errors are
directly the consequence of the bad data structure, but somehow this
representation has been standardized into the language itself so that
there is no other way to work with strings and use the standard
library.
1.3 Wrongly specified and buggy library functions.
-------------------------------------------------
One of the evergreens of people critizing the c language is the
function "gets". This function is supposed to read characters from some
file into a buffer, but it doesn't provide any means to test if the end
of the buffer is reached, so it is impossible by design to use it in a
manner that ensures the absence of a buffer overflow.
I have been discussing this (and other) function since several years in
the discussion group comp.std.c. Nobody from the many people present in
those discussions has ever tried to justify gets(), except the
committee representative Mr Gwyn, that defended it against all odds.
The committee has accepted that this function is declared obsolete but
apparently it has still not gotten around to eliminate it from the
language: it still appears in the drafts for the new standard as of
March 2009.
Other functions are less obviously wrong but nevertheless far from what
would be correct specifications for a language to be used in the 21st
century. Most of the functions in the C library fail to do the most
elementary error analysis, reflecting the time when error analysis and
bullet proof software was not needed since software was used in
controlled and secure environments. The ocassional crash because of bad
inputs was a necessary evil.
The problem is that (again) the standards committee doesn't see this as
a problem at all. For instance, we have the asctime() function
specified in the standard as a code listing. Problem is, if you happen
to give this function a date that is beyond the year 8099 it will
exhibit a buffer overflow. So we have the incredible situation where a
function exhibiting a buffer overflow is printed in the standards text,
and the committee refuses to acknowledge this fact since at least 10
years.
2: Language issues
==================
(2.1) It is impossible to create new numeric types
------------------------------------------------
C features already an incredible number of numeric types. We have:
o _Bool (1 bit type)
o char (small int) signed+unsigned
o short signed+unsigned
o int signed+unsigned
o long signed+unsigned
o long long signed+unsigned
o float
o double
o long double
o _Complex float
o _Complex double
o _Complex long double
All those aren't needed at the same type in all applications, specially
the _Complex types. Besides those, there are technical reports
presented already to the committee that propose decimal based numbers,
fixed point numbers, and possibly others.
It is clear that if we are going to support all possible types, the
language will grow beyond what is reasonable to expect. The solution
for this "problem" is simple, and will be discussed in the "proposals"
section.
(2.2) It is impossible to create another type of strings
------------------------------------------------------
Many libraries exist that implement counted strings, but they all
require the user to forget the natural notation str[5] = 'a' and adopt
a cumbersome notation in the style of:
asgnstr(str,5,'a');
Luckily the solution to this problem is the same as in (2.1). See below.
3: The proposed solutions
=========================
3.1 The solutions to the language problems.
In the lcc-win compiler I have developed a solution to the language
problems, that simplifies the language and at the same time opens it up
to the user.
Operator overloading is an accepted technique in many languages, from
the venerable FORTRAN to C#. lcc-win proposes this solution to:
o Develop new kinds of numbers.
To prove the concept, (and debug it) lcc-win has used operator
overloading to implement 352 bits floats, boasting 105 decimal digits
of precision.
o Develop a counted string library using operator overloading to access
the strings with [ and ] instead of the more cumbersome function call
notation.
This single solution allows to solve both problems, and allows at the
same time to take complex numbers OUT of the language, making it
smaller. At the same time, the language could accomodate the proposed
decimal and fixed point numbers, using the same interface.
3.2 The solutions to the library problems.
------------------------------------------
Microsoft has proposed a set of more secure functions that should
replace the unsecure ones. The particular definitions of those library
functions aren't so much interesting in this context. What *IS*
relevant is that the error analysis is an integral part of the
specifications. All arguments allowed value ranges are described, and
the possible errors enumerated. The language should take this way of
specifying the functions inthe library as a guideline to be used in the
specifications of ALL functions in the library.
The new C standard should specify an interface for containers like lists
or hash tables. Again, I have developed basic examples in the lcc-win
compiler. I have published the code in the discussion group comp.lang.c.
Conclusion
==========
The proposed changes do not alter fundamentally the nature of C. They
are very small and maintain the essential simplicity of the language.
True, C is simple.
But, as Einstein said, things should be as simple as possible but not
simpler.
With the development of the c++ language, c was destroyed. Destroyed in
the sense that all language development ceased, and "naturally" all
developpers changed horses to the "new and improved" c++, leaving c as a
bad souvenir or at best a curiosity to be used in embedded systems or
similar environments where c++ doesn't cut it.
Many c++ books (specially some older ones) start with a chapter telling
people how bad C is, and why c++ solves all the problems of c. Of course
this is not very difficult to do since c remained as it was in 1989,
without any change.
Still, I think that c can be a very good language precisely because of
its simplicity. It is the only high level language where there isn't any
"object oriented" framework that has been implemented in most current
languages, from java to cobol.
What are the main issues with C?
(1) The obsolete c library.
===========================
The c library is part of the language and it was one of the first
standardized, portable implementation of a language across a number of
computers.
In the 21st century however, it shows its age.
Its main problems are:
1.1: No support for standard containers like lists or hash tables
-----------------------------------------------------------------
A lack of a standard way to implement containers like lists,
flexible arrays, hash tables, and many others is one of the worst
features of the languages. This state of affairs means that each c
programmer must implement again and again the same routines, and start
each time with the debugging of the new code. It means also that it is
not possible to share code since each routine will use a slightly
different interface.
When it was proposed in the discussion group comp.std.c that the
standards committee standardizes a common interface, the "speaker" of
the committee, Mr Gwyn answered:
(Msg ID:<[email protected]> thread "C needs a BOOST",
10/4/2007)
--------------------------------<quote>
The very age of C might be partly responsible, in that the vast
amount of existing C applications already embed some solutions
to the requirements for lists, etc. The maintenance programmer
(a)is unlikely to rework the existing app just to use some new
standardized interface for the same thing; and (b) has to
continue to maintain whatever libraries he has been using.
The only real use for such a library would be for new program
development, once the learning hurdle has been overcome. Much
new development really ought to use higher-level languages in
the first place.
------------------------------<end quote>
The argument of the committee speaker is then:
(A) Old programs have already lists and container code. They
do not need any new specification.
(B) New programs should not be written in c but in "higher-level"
languages.
This is the attitude of many people in the standards committee and in
the c/c++ programming community in general. C is dead and the best thing
to do is to maintain it in that situation.
I disagree profoundly with this attitude.
1.2: A completely obsolete representation of strings.
----------------------------------------------------
The "C strings" are represented by a sequences of characters in memory
followed by a zero byte. Since they have to be maintained manually (it
is the responsability of the programmer to maintain that terminating
zero) it offers ample opportunities for errors. This errors are
directly the consequence of the bad data structure, but somehow this
representation has been standardized into the language itself so that
there is no other way to work with strings and use the standard
library.
1.3 Wrongly specified and buggy library functions.
-------------------------------------------------
One of the evergreens of people critizing the c language is the
function "gets". This function is supposed to read characters from some
file into a buffer, but it doesn't provide any means to test if the end
of the buffer is reached, so it is impossible by design to use it in a
manner that ensures the absence of a buffer overflow.
I have been discussing this (and other) function since several years in
the discussion group comp.std.c. Nobody from the many people present in
those discussions has ever tried to justify gets(), except the
committee representative Mr Gwyn, that defended it against all odds.
The committee has accepted that this function is declared obsolete but
apparently it has still not gotten around to eliminate it from the
language: it still appears in the drafts for the new standard as of
March 2009.
Other functions are less obviously wrong but nevertheless far from what
would be correct specifications for a language to be used in the 21st
century. Most of the functions in the C library fail to do the most
elementary error analysis, reflecting the time when error analysis and
bullet proof software was not needed since software was used in
controlled and secure environments. The ocassional crash because of bad
inputs was a necessary evil.
The problem is that (again) the standards committee doesn't see this as
a problem at all. For instance, we have the asctime() function
specified in the standard as a code listing. Problem is, if you happen
to give this function a date that is beyond the year 8099 it will
exhibit a buffer overflow. So we have the incredible situation where a
function exhibiting a buffer overflow is printed in the standards text,
and the committee refuses to acknowledge this fact since at least 10
years.
2: Language issues
==================
(2.1) It is impossible to create new numeric types
------------------------------------------------
C features already an incredible number of numeric types. We have:
o _Bool (1 bit type)
o char (small int) signed+unsigned
o short signed+unsigned
o int signed+unsigned
o long signed+unsigned
o long long signed+unsigned
o float
o double
o long double
o _Complex float
o _Complex double
o _Complex long double
All those aren't needed at the same type in all applications, specially
the _Complex types. Besides those, there are technical reports
presented already to the committee that propose decimal based numbers,
fixed point numbers, and possibly others.
It is clear that if we are going to support all possible types, the
language will grow beyond what is reasonable to expect. The solution
for this "problem" is simple, and will be discussed in the "proposals"
section.
(2.2) It is impossible to create another type of strings
------------------------------------------------------
Many libraries exist that implement counted strings, but they all
require the user to forget the natural notation str[5] = 'a' and adopt
a cumbersome notation in the style of:
asgnstr(str,5,'a');
Luckily the solution to this problem is the same as in (2.1). See below.
3: The proposed solutions
=========================
3.1 The solutions to the language problems.
In the lcc-win compiler I have developed a solution to the language
problems, that simplifies the language and at the same time opens it up
to the user.
Operator overloading is an accepted technique in many languages, from
the venerable FORTRAN to C#. lcc-win proposes this solution to:
o Develop new kinds of numbers.
To prove the concept, (and debug it) lcc-win has used operator
overloading to implement 352 bits floats, boasting 105 decimal digits
of precision.
o Develop a counted string library using operator overloading to access
the strings with [ and ] instead of the more cumbersome function call
notation.
This single solution allows to solve both problems, and allows at the
same time to take complex numbers OUT of the language, making it
smaller. At the same time, the language could accomodate the proposed
decimal and fixed point numbers, using the same interface.
3.2 The solutions to the library problems.
------------------------------------------
Microsoft has proposed a set of more secure functions that should
replace the unsecure ones. The particular definitions of those library
functions aren't so much interesting in this context. What *IS*
relevant is that the error analysis is an integral part of the
specifications. All arguments allowed value ranges are described, and
the possible errors enumerated. The language should take this way of
specifying the functions inthe library as a guideline to be used in the
specifications of ALL functions in the library.
The new C standard should specify an interface for containers like lists
or hash tables. Again, I have developed basic examples in the lcc-win
compiler. I have published the code in the discussion group comp.lang.c.
Conclusion
==========
The proposed changes do not alter fundamentally the nature of C. They
are very small and maintain the essential simplicity of the language.
True, C is simple.
But, as Einstein said, things should be as simple as possible but not
simpler.