D
David Masover
This is hardly necessary. Pointer arithmetic can certainly be done safely.
Can be. However, the fact that it exists opens the door to a whole class of
weird and hard-to-pin-down crashes (and possible vulnerabilities) that simply
don't happen if you don't (or can't) do it.
But that wasn't my point. My point was that if you consider lack of pointer
arithmetic, garbage collection, and other features to be a selling point of
higher-level languages, you can do all that in C, and you can make it _almost_
automatic in C++.
I'm not sold on this. I don't think I've had any buffer overflows in my
code in years. It's pretty easy -- if I'm about to use a buffer, I make
sure I know what I'm using it for and that I cap any copies and/or report
failure if there's not enough space.
My favorite example is here:
http://joelonsoftware.com/articles/fog0000000319.html
I like this both for the ludicrous example, when he finally decides to figure
out how much to allocate:
char* bigString;
int i = 0;
i = strlen("John, ")
+ strlen("Paul, ")
+ strlen("George, ")
+ strlen("Joel ");
bigString = (char*) malloc (i + 1);
...and for the ludicrous inefficiency. He's going to scan through each string
at least twice, and that's with a customized strcat -- it gets much worse with
the real strcat.
And remember, his next step is:
char *p = bigString;
bigString[0] = '\0';
p = mystrcat(p,"John, ");
p = mystrcat(p,"Paul, ");
p = mystrcat(p,"George, ");
p = mystrcat(p,"Joel ");
It's still a bit sloppy -- that initial null assignment makes me cringe -- but
think about this. Even if you ignore the fact that we've got each string
duplicated here -- let's say they're variables:
int i = 0;
i = strlen(a) + strlen(b) + strlen(c) + strlen(d);
bigString = (char*) malloc (i+1);
char *p = bigString;
bigString[0] = '\0';
p = mystrcat(p,a);
p = mystrcat(p,b);
p = mystrcat(p,c);
p = mystrcat(p,d);
Now suppose you add a string to that, or remove it. If you add it to one place
and not the other, or remove it from one place and not the other, you're
either wasting RAM or hitting a buffer overrun every time.
Then again, this kind of malloc is probably inefficient, as the article points
out. Instead, you probably want to allocate some power of 2 -- at which point,
you want to make sure you've always allocated a power of two that's more than
you need, not less than you need.
Are you sure you never make a mistake here?
Because this is the kind of thing that I don't have to think about. Yes, it's
less efficient, but if I have a bunch of strings in Ruby, I can just do this:
big_string = a + b + c + d
There are other, more efficient ways, like:
big_string = a.dup << b << c << d
or
big_string = "#{a}#{b}#{c}#{d}"
The point is, though, while these have varying degrees of efficiency, none of
them have the possibility that I'll forget something and open myself up to a
vulnerability or a crash. Worst case, I waste a bit of RAM, and 100% of the
RAM I waste here can be garbage-collected later, whereas in C, if I waste it,
it's wasted, possibly even leaked.
So not only is it ridiculously easier, it's also safer.
It's also possibly faster, because since it's a higher-level abstraction, the
runtime might (in theory; I bet Ruby doesn't) notice that these are all
strings and that you're just concatenating them, so it could use some sort of
StringBuilder automatically.
Even if it doesn't, it still has the option of storing the length of a string
separately, rather than using null-terminated strings -- thus saving you at
least half your time in an operation like ("a" + "b").
Am I being unrealistic? Is this the kind of thing you'd never do?
I agree that it requires actual effort, as opposed to being implicit.
The point here is that the implicit version also implicitly handles all the
safety for you. Another example might be SQL manipulation. To keep myself
sane, let's do this with Ruby:
execute "select hashed_password from users where username = '#{name}'"
The problem with that code should be blindingly obvious. Of course, I should
probably be doing something like this:
execute "select hashed_password from users where username = '#{escape name}'"
The problem is, this requires me to always, always remember to do it. This is
how a lot of PHP stuff is written, though I'm told it's changing, and those in
the know use libraries that allow you to do it the Right Way. How would the
Right Way look?
execute 'select hashed_password from users where username = ?', name
Can you see why that's safer? I can develop a much easier to maintain habit of
using only single-quoted strings as my queries. Since the actual values are
always passed separately, they are always escaped -- I don't have to remember
anything special to make that work.
So I can develop a very, very simple habit (use single-quoted strings) that I
can almost unconsciously apply everywhere, and I will never be subject to a
SQL injection attack.
Or I can try to develop a habit of manually escaping -- the problem is that
sooner or later, mistakes WILL happen. Best case, I develop such muscle memory
of doing it this way that I end up accidentally doing this:
puts "Hello, #{escape name}!"
That way, worst case, it goes unnoticed for months until someone named
O'Harris signs up and wonders why the system thinks their name is O''Harris or
O\'Harris.
The point is that higher levels of abstraction do allow us to abstract away
opportunities to screw things up. This is true in the language itself, and in
the libraries.
And if I've convinced you of that, don't worry, low-level skill is still
needed. Another of my favorite articles:
http://joelonsoftware.com/articles/LeakyAbstractions.html
It helps to understand what's going on at the C level, even if I never want to
actually touch it, because that might give me some insight as to why
"Hello, #{name}!"
is more efficient than
'Hello, '+name+'!'
Try it yourself:
require 'benchmark'
name = 'steve'
Benchmark.bm do |x|
x.report { 10000000.times { "Hello, #{name}!" }}
x.report { 10000000.times { 'Hello, '+name+'!' }}
end
My results:
user system total real
6.010000 0.020000 6.030000 ( 6.104799)
7.500000 0.010000 7.510000 ( 7.505193)
It only gets better, the more interpolated values you have. a+b is more
efficient than "#{a}#{b}", but a+b+c+d is less efficient than
"#{a}#{b}#{c}#{d}".
This was very surprising to me. Then I went back and read that article, and
thought a bit about the concept of a string builder. Now it makes sense, even
though it's still a bit counterintuitive.
So I'm glad I sort of know C, and I'm just as glad I don't have to use it
much.
The killer for me was
discovering that there was a thing like a function pointer which could be
used only for user-defined functions, not built-in functions.
I could live with that, but I'm guessing it might've been the last straw...
For me, I'm spoiled by blocks now. I can fake them in Javascript, and even
(though less effectively) in Java, but not in PHP, that I know of.