Wordpress Port

D

David Masover

This is hardly necessary. Pointer arithmetic can certainly be done safely.

Can be. However, the fact that it exists opens the door to a whole class of
weird and hard-to-pin-down crashes (and possible vulnerabilities) that simply
don't happen if you don't (or can't) do it.

But that wasn't my point. My point was that if you consider lack of pointer
arithmetic, garbage collection, and other features to be a selling point of
higher-level languages, you can do all that in C, and you can make it _almost_
automatic in C++.
I'm not sold on this. I don't think I've had any buffer overflows in my
code in years. It's pretty easy -- if I'm about to use a buffer, I make
sure I know what I'm using it for and that I cap any copies and/or report
failure if there's not enough space.

My favorite example is here:

http://joelonsoftware.com/articles/fog0000000319.html

I like this both for the ludicrous example, when he finally decides to figure
out how much to allocate:

char* bigString;
int i = 0;
i = strlen("John, ")
+ strlen("Paul, ")
+ strlen("George, ")
+ strlen("Joel ");
bigString = (char*) malloc (i + 1);

...and for the ludicrous inefficiency. He's going to scan through each string
at least twice, and that's with a customized strcat -- it gets much worse with
the real strcat.

And remember, his next step is:

char *p = bigString;
bigString[0] = '\0';
p = mystrcat(p,"John, ");
p = mystrcat(p,"Paul, ");
p = mystrcat(p,"George, ");
p = mystrcat(p,"Joel ");

It's still a bit sloppy -- that initial null assignment makes me cringe -- but
think about this. Even if you ignore the fact that we've got each string
duplicated here -- let's say they're variables:

int i = 0;
i = strlen(a) + strlen(b) + strlen(c) + strlen(d);
bigString = (char*) malloc (i+1);

char *p = bigString;
bigString[0] = '\0';
p = mystrcat(p,a);
p = mystrcat(p,b);
p = mystrcat(p,c);
p = mystrcat(p,d);

Now suppose you add a string to that, or remove it. If you add it to one place
and not the other, or remove it from one place and not the other, you're
either wasting RAM or hitting a buffer overrun every time.

Then again, this kind of malloc is probably inefficient, as the article points
out. Instead, you probably want to allocate some power of 2 -- at which point,
you want to make sure you've always allocated a power of two that's more than
you need, not less than you need.

Are you sure you never make a mistake here?

Because this is the kind of thing that I don't have to think about. Yes, it's
less efficient, but if I have a bunch of strings in Ruby, I can just do this:

big_string = a + b + c + d

There are other, more efficient ways, like:

big_string = a.dup << b << c << d

or

big_string = "#{a}#{b}#{c}#{d}"

The point is, though, while these have varying degrees of efficiency, none of
them have the possibility that I'll forget something and open myself up to a
vulnerability or a crash. Worst case, I waste a bit of RAM, and 100% of the
RAM I waste here can be garbage-collected later, whereas in C, if I waste it,
it's wasted, possibly even leaked.

So not only is it ridiculously easier, it's also safer.

It's also possibly faster, because since it's a higher-level abstraction, the
runtime might (in theory; I bet Ruby doesn't) notice that these are all
strings and that you're just concatenating them, so it could use some sort of
StringBuilder automatically.

Even if it doesn't, it still has the option of storing the length of a string
separately, rather than using null-terminated strings -- thus saving you at
least half your time in an operation like ("a" + "b").

Am I being unrealistic? Is this the kind of thing you'd never do?
I agree that it requires actual effort, as opposed to being implicit.

The point here is that the implicit version also implicitly handles all the
safety for you. Another example might be SQL manipulation. To keep myself
sane, let's do this with Ruby:

execute "select hashed_password from users where username = '#{name}'"

The problem with that code should be blindingly obvious. Of course, I should
probably be doing something like this:

execute "select hashed_password from users where username = '#{escape name}'"

The problem is, this requires me to always, always remember to do it. This is
how a lot of PHP stuff is written, though I'm told it's changing, and those in
the know use libraries that allow you to do it the Right Way. How would the
Right Way look?

execute 'select hashed_password from users where username = ?', name

Can you see why that's safer? I can develop a much easier to maintain habit of
using only single-quoted strings as my queries. Since the actual values are
always passed separately, they are always escaped -- I don't have to remember
anything special to make that work.

So I can develop a very, very simple habit (use single-quoted strings) that I
can almost unconsciously apply everywhere, and I will never be subject to a
SQL injection attack.

Or I can try to develop a habit of manually escaping -- the problem is that
sooner or later, mistakes WILL happen. Best case, I develop such muscle memory
of doing it this way that I end up accidentally doing this:

puts "Hello, #{escape name}!"

That way, worst case, it goes unnoticed for months until someone named
O'Harris signs up and wonders why the system thinks their name is O''Harris or
O\'Harris.

The point is that higher levels of abstraction do allow us to abstract away
opportunities to screw things up. This is true in the language itself, and in
the libraries.

And if I've convinced you of that, don't worry, low-level skill is still
needed. Another of my favorite articles:

http://joelonsoftware.com/articles/LeakyAbstractions.html

It helps to understand what's going on at the C level, even if I never want to
actually touch it, because that might give me some insight as to why

"Hello, #{name}!"

is more efficient than

'Hello, '+name+'!'

Try it yourself:

require 'benchmark'
name = 'steve'
Benchmark.bm do |x|
x.report { 10000000.times { "Hello, #{name}!" }}
x.report { 10000000.times { 'Hello, '+name+'!' }}
end

My results:

user system total real
6.010000 0.020000 6.030000 ( 6.104799)
7.500000 0.010000 7.510000 ( 7.505193)

It only gets better, the more interpolated values you have. a+b is more
efficient than "#{a}#{b}", but a+b+c+d is less efficient than
"#{a}#{b}#{c}#{d}".

This was very surprising to me. Then I went back and read that article, and
thought a bit about the concept of a string builder. Now it makes sense, even
though it's still a bit counterintuitive.

So I'm glad I sort of know C, and I'm just as glad I don't have to use it
much.
The killer for me was
discovering that there was a thing like a function pointer which could be
used only for user-defined functions, not built-in functions.

I could live with that, but I'm guessing it might've been the last straw...

For me, I'm spoiled by blocks now. I can fake them in Javascript, and even
(though less effectively) in Java, but not in PHP, that I know of.
 
S

Seebs

I like this both for the ludicrous example, when he finally decides to figure
out how much to allocate:
char* bigString;
int i = 0;
i = strlen("John, ")
+ strlen("Paul, ")
+ strlen("George, ")
+ strlen("Joel ");
bigString = (char*) malloc (i + 1);

size_t len;
len = snprintf(NULL, 0, "%s, %s, %s, %s", "John", "Paul", "George", "Joel");
bigString = malloc(len + 1);

:)
Now suppose you add a string to that, or remove it. If you add it to one place
and not the other, or remove it from one place and not the other, you're
either wasting RAM or hitting a buffer overrun every time.

Which is why snprintf was added.
Are you sure you never make a mistake here?

Not totally sure, but I don't think I've found one in my code in at least
five years. I'm pretty careful.

Oh, wait! I did have one. It wasn't really in an entirely comparable
context, but there was a case where there was a plain error in the
attempt to count how many copies of a substring I needed space for.
Am I being unrealistic? Is this the kind of thing you'd never do?

Not really, it's just that it's the kind of thing I do by idiom, and the
idioms are safe and effective.
The problem is, this requires me to always, always remember to do it. This is
how a lot of PHP stuff is written, though I'm told it's changing, and those in
the know use libraries that allow you to do it the Right Way. How would the
Right Way look?

Heh. Actually, I reinvented this myself, pretty recently; I had to do some
database stuff in PHP (don't ask, it's horrible). So EVERY single usage
looks like:
$foo = do_query("SELECT a, b FROM table WHERE other_id = '%d';", $id);

(note: Yes, I'm passing a number in as a string. I inherited a database
where every foreign key was stored as a VARCHAR string holding the digits
of the number of the foreign table's plain old integer id column. A
chunk of this database made it to the front page of the Daily WTF once.)
Can you see why that's safer? I can develop a much easier to maintain habit of
using only single-quoted strings as my queries. Since the actual values are
always passed separately, they are always escaped -- I don't have to remember
anything special to make that work.
Yup!

The point is that higher levels of abstraction do allow us to abstract away
opportunities to screw things up. This is true in the language itself, and in
the libraries.
Yes.

I could live with that, but I'm guessing it might've been the last straw...

It was the point at which I told one of my friends that PHP had finally
achieved the dubious distinction of being the first language I actually
thought was uglier than perl. He told me maybe I should look at Ruby,
and that was probably one of the best bits of advice I've gotten since,
sometime in the tail end of the 80s, someone told me about the distinction
between "works in C" and "works in this particular compiler".

-s
 
P

pharrington

size_t len;
len = snprintf(NULL, 0, "%s, %s, %s, %s", "John", "Paul", "George", "Joel");
bigString = malloc(len + 1);

:)


Which is why snprintf was added.


Not totally sure, but I don't think I've found one in my code in at least
five years.  I'm pretty careful.

Oh, wait!  I did have one.  It wasn't really in an entirely comparable
context, but there was a case where there was a plain error in the
attempt to count how many copies of a substring I needed space for.


Not really, it's just that it's the kind of thing I do by idiom, and the
idioms are safe and effective.


Heh.  Actually, I reinvented this myself, pretty recently; I had to do some
database stuff in PHP (don't ask, it's horrible).  So EVERY single usage
looks like:
        $foo = do_query("SELECT a, b FROM table WHERE other_id = '%d';", $id);

(note:  Yes, I'm passing a number in as a string.  I inherited a database
where every foreign key was stored as a VARCHAR string holding the digits
of the number of the foreign table's plain old integer id column.  A
chunk of this database made it to the front page of the Daily WTF once.)


It was the point at which I told one of my friends that PHP had finally
achieved the dubious distinction of being the first language I actually
thought was uglier than perl.  He told me maybe I should look at Ruby,
and that was probably one of the best bits of advice I've gotten since,
sometime in the tail end of the 80s, someone told me about the distinction
between "works in C" and "works in this particular compiler".

-s

As long as we're not talking about Ruby, for years (ironically mostly
since I stopped coding PHP) I've wondered why people always forget
that PHP *does* have SQL prepared statements built into the standard
library (with mysqli). Granted, it's not exactly as slick as anything
in Ruby, but its really not hard to just call prepare() and bind_param
(), or even to take a minute to hack out a function that does this
more concisely. But then I remember, the official guides don't stress
the use of this, and hardly any tutorials even mention it (I've never
read any PHP books, so I can't comment on that state of affairs). But
yeah, the use of a language just goes in the direction of whats
easiest to do with it :\
 
S

Seebs

As long as we're not talking about Ruby, for years (ironically mostly
since I stopped coding PHP) I've wondered why people always forget
that PHP *does* have SQL prepared statements built into the standard
library (with mysqli).

Neat.

Unfortunately, in my case, useless because I was using a fairly
lightly-documented extension to access Some Other Database.

-s
 
M

Michael Shigorin

Someone could

"hey it's linux -- you need it, you do it".

Of course it's also reasonable to ask before plunging,
but thinking that someone else will do what one wants
is not how things work: they work as "consider one's
options and work up to something worth showing others".
And _then_ learning (or improving a lot) to work with
those others, even if they're different (but really do
like what you do).

PS: regarding "rich set of libraries", is there at least
some (let alone reasonable) SOAP 1.2+ implementation?
soap4r seems to cover obsolete spec, and I've ran
into that already a few months ago. :( I'm aware of
WSO2 WSF/Ruby but am yet to package and try it.
 
S

Seebs


Not relevant.
And yes it works with Some Other Databases too:
http://lt2.php.net/manual/en/pdo.drivers.php

But not "REAL SQL Server (tm)", a network wrapper around sqlite, for which
the only way to communicate with it is to use a one-off PHP extension which
works only with a specific third-party build of PHP for OS X, and cannot be
used with, say, the standard prebuilt. (The PHP extension is distributed
as a binary, so no recompiling to match different options.)

Let's just say that, in this case, I do not blame PHP for that selection
of problems. PHP is also not to blame for the original developer's decision
to use VARCHAR for all fields except primary keys -- but including foreign
key references to primary keys in other tables.

I had about two weeks to get everything running, and a trivial wrapper
which did the quoting in a consistent way was Good Enough.

-s
 
M

Marc Heiler

Big discussion. :)

Just two comments from my side:

"Wordpress is enormous. Porting would be non-trivial."

I agree, and the same is valid for trac (python), mediawiki, wordpress,
phpBB (php), and so on.

But it is good applications which help make a language more widespread.
How many people came to Ruby via RoR?

As far as language design is concerned, I think ruby is a lot better
than PHP.

So why not tackle those areas? I do not believe in "use the right tool
for a job", because this effectively translates to "use the better
application", no matter what language. I can understand speed reasons,
like using C rather than ruby in case you really need the speed.

But I don't *want* to use php. As a user, I don't have to care - I can
use a php-forum, or mediawiki, and never meddle into the ugly php code.
However, if I build my projects with ruby, then I want to use ruby AS
MUCH as possible, simply because using ruby is a lot more fun than using
php (or perl for that matter - I don't really have the same problem with
python, from my point of view ruby and python are much closer these days
than i.e. ruby and php).

Noone has to do a 1:1 porting. Just make the application useful enough
to get a start with it, and extend from there. The old phpbb wasn't that
sophisticated, but it got better and better lateron.

Wordpress has one thing going, which is simplicity (for a USER).

So I think ruby needs some more cool apps, especially web-apps. Whether
these require one to use RoR or not is not so important, but they need
exist.

"Perl does not encourage writing good code any more than PHP does.
And yes OOP in PHP5 looks much better than in Perl5."

Sure, we can argue about this. Ultimately it depends on the developer
writing code. But I think we can all agree that certain languages, let's
say after 3 years of heavy usage, make it significantly harder to
understand what's going on. Take the lisp guys. They reason that the use
of () is no problem for them since they don't even really see them.

Every language design has a big impact of the underlying code. Being
succint is a huge plus of ruby here compared to php and perl. In my
opinion a good language helps people become better developers as well -
by giving them ways to express what they want, and making it as easy as
possible.

After 3 years of php, and comparing it to 3 years of ruby, I can without
a shadow of doubt state that ruby makes it a lot easier to write
beautiful code than php does.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,163
Messages
2,570,897
Members
47,434
Latest member
TobiasLoan

Latest Threads

Top