I disagree about undefined behaviour causing a large proportion of
security holes.
I didn't actually specify "large proportion", that's your words. But
since you mention crashes:
Maybe it produces some, but it's more likely to produce
crashes or inoperative codde.
*Every* crash is a potential security hole. Not only is a denial of
service, but a fatal exception[1] is a sign that arbitrary memory has
been executed as if it were code, or an illegal instruction executed.
Every such crash is a potential opportunity for an attacker to run
arbitrary code. There are only two sorts of bugs: bugs with exploits, and
bugs that haven't been exploited *yet*.
I think you are severely under-estimating the rule of undefined behaviour
in C on security vulnerabilities. I quote from "Silent Elimination of
Bounds Checks":
"Most of the security vulnerabilities described in my book, Secure Coding
in C and C++, Second Edition, are the result of exploiting undefined
behavior in code."
http://www.informit.com/articles/article.aspx?p=2086870
Undefined behaviour interferes with the ability of the programmer to
understand causality with respect to his source code. That makes bugs of
all sorts more likely, including buffer overflows.
Earlier this year, four researchers at MIT analysed how undefined
behaviour is effecting software, and they found that C compilers are
becoming increasingly aggressive at optimizing such code, resulting in
more bugs and vulnerabilities. They found 32 previously unknown bugs in
the Linux kernel, 9 in Postgres and 5 in Python.
http://www.itworld.com/security/380406/how-your-compiler-may-be-compromising-application-security
I believe that the sheer number of buffer overflows in C is more due to
the language semantics than the (lack of) skill of the programmers. C the
language pushes responsibility for safety onto the developer. Even expert
C programmers cannot always tell what their own code will do. Why else do
you think there are so many applications for checking C code for buffer
overflows, memory leaks, buggy code, and so forth? Because even expert C
programmers cannot detect these things without help, and they don't get
that help from the language or the compiler.
[...]
Apart from the last one (file system atomicity, not a C issue at all),
every single issue on that page comes back to one thing: fixed-size
buffers and functions that treat a char pointer as if it were a string.
In fact, that one fundamental issue - the buffer overrun - comes up
directly when I search Google for 'most common security holes in c code'
I think that you have missed the point that buffer overflows are often a
direct consequence of the language. For example:
http://www.kb.cert.org/vuls/id/162289
Quote:
"Some C compilers optimize away pointer arithmetic overflow tests that
depend on undefined behavior without providing a diagnostic (a warning).
Applications containing these tests may be vulnerable to buffer overflows
if compiled with these compilers."
The truly frightening thing about this is that even if the programmer
tries to write safe code that checks the buffer length, the C compiler is
*allowed to silently optimize that check away*.
Python is actually *worse* than C in this respect.
You've got to be joking.
I know this
particular one is reasonably well known now, but how likely is it that
you'll still see code like this:
def create_file():
f = open(".....", "w")
f.write(".......")
f.write(".......")
f.write(".......")
Looks fine, is nice and simple, does exactly what it should. And in
(current versions of) CPython, this will close the file before the
function returns, so it'd be perfectly safe to then immediately read
from that file. But that's undefined behaviour.
No it isn't. I got chastised for (allegedly) conflating undefined and
implementation-specific behaviour. In this case, whether the file is
closed or not is clearly implementation-specific behaviour, not
undefined. An implementation is permitted to delay closing the file. It's
not permitted to erase your hard drive.
Python doesn't have an ISO standard like C, so where the documentation
doesn't define the semantics of something, CPython behaves as the
reference implementation. CPython allows you to simultaneously open the
same file for reading and writing, in which case subsequent reads and
writes will deterministically depend on the precise timing of when writes
are written to disk. That's not something which the language can control,
given the expected semantics of file I/O. The behaviour is defined, but
it's defined in such a way that what you'll get is deterministic but
unpredictable -- a bit like dict order, or pseudo-random numbers.
A Python implementation is not permitted to optimize away subsequent
reads, erase your hard drive, or download a copy of Wikipedia from the
Internet. A C compiler is permitted to do any of these.
(Of course, no competent C compiler would actually download all of
Wikipedia, since that would be slow. Instead, they would probably only
download the HTTP headers for the main page.)
[1] I'm talking low level exceptions or errors, not Python exceptions.