S
Sascha Ebach
Hi there,
I plan to do a fairly large (how large depends on the commercial
success) project that will be programmed completely in Ruby. I can't
talk about the specifics right now, but to give you an idea it will be
something like David Heinemeier Hansson's Basecamp except that it will
not have anything to do with project management, so no direct
competition (it will be completely german project anyway). If I remember
correctly I read somewhere in his blog that basecamp is a medium sized
project, so
when I say -large- project I actually mean a project which will start
out small and depending on the commercial success could get a very large
user base. At least that is what I am hoping for It could of course
completely fail, only time will tell.
I would like to ask a couple of questions on the scalability of these
kinds of web projects. I was thinking of writing to David directly but I
was hoping that if he (and others with that kind of experience) answers
my questions on the list everybody could profit from that.
Following are my conceptions of what I heard or have read about Ruby. If
I am wrong with any of these, don't hesitate to correct me.
In my understanding the biggest problem in scaling Ruby (cpu wise) is
that it doesn't have native thread support, yet. What this means in
terms of a web application is that if you only have let's say 30
concurrent users on a fairly new piece of Hardware this is not a
problem. But what happens if your site suddenly gets very popular and
you jump from 20 to 200 or even 2000 concurrent users? How do you scale
such a web app? If you were to program this web app in Java or any other
language which supports native threads you could simply throw more cpus
and ram at it. I am thinking of a blade server here. The more users you
get you simply stick another blade in your server and have your piece of
mind. As I understand it you cannot do something like that with Ruby.
Enter Distributed Ruby (DRb).
As Martin Fowler states in his first law of distributed object design:
Don't distribute your objects!
<http://c2.com/cgi/wiki?FirstLawOfDistributedObjectDesign>
Things can really get hairy when getting distributed. All kinds of
things can go wrong. IMO it is kinda like the step from single threaded
programming to multi threaded programming or worse. So it is always a
good thing if you can avoid it. I can't even start to think about how I
would (unit) test such a beast but as of now I don't see any real
alternatives for scaling a Ruby app. Of course when done right it has
many many benefits. One is that you can buy the cheapest hardware and
plug them together like Google is doing it, but Google seems to have an
armada of excellent programmers (so it seems) to handle the pontentially
very difficult distributed stuff. I am only a single (maybe a little
over ambitious) programmer. I have seen a couple of very nice and simple
examples in drb (from Dave Thomas for example), but would you really
advise using drb for some big time commercial web app?
I would be very curious what kind of strategy 37signals has with their
Basecamp. Maybe David can elaborate on that if it is not a big secret. I
am very eager to hear about specific choices from anyone who has
similiar experience.
- What kind of hardware are you using?
- Where are the biggest performance bottlenecks in your environment?
- What is your hardware upgrade path in case active user numbers go
through the roof?
- What kind of httpd do you use?
- What kind of framework are you using?
Thanks
I plan to do a fairly large (how large depends on the commercial
success) project that will be programmed completely in Ruby. I can't
talk about the specifics right now, but to give you an idea it will be
something like David Heinemeier Hansson's Basecamp except that it will
not have anything to do with project management, so no direct
competition (it will be completely german project anyway). If I remember
correctly I read somewhere in his blog that basecamp is a medium sized
project, so
when I say -large- project I actually mean a project which will start
out small and depending on the commercial success could get a very large
user base. At least that is what I am hoping for It could of course
completely fail, only time will tell.
I would like to ask a couple of questions on the scalability of these
kinds of web projects. I was thinking of writing to David directly but I
was hoping that if he (and others with that kind of experience) answers
my questions on the list everybody could profit from that.
Following are my conceptions of what I heard or have read about Ruby. If
I am wrong with any of these, don't hesitate to correct me.
In my understanding the biggest problem in scaling Ruby (cpu wise) is
that it doesn't have native thread support, yet. What this means in
terms of a web application is that if you only have let's say 30
concurrent users on a fairly new piece of Hardware this is not a
problem. But what happens if your site suddenly gets very popular and
you jump from 20 to 200 or even 2000 concurrent users? How do you scale
such a web app? If you were to program this web app in Java or any other
language which supports native threads you could simply throw more cpus
and ram at it. I am thinking of a blade server here. The more users you
get you simply stick another blade in your server and have your piece of
mind. As I understand it you cannot do something like that with Ruby.
Enter Distributed Ruby (DRb).
As Martin Fowler states in his first law of distributed object design:
Don't distribute your objects!
<http://c2.com/cgi/wiki?FirstLawOfDistributedObjectDesign>
Things can really get hairy when getting distributed. All kinds of
things can go wrong. IMO it is kinda like the step from single threaded
programming to multi threaded programming or worse. So it is always a
good thing if you can avoid it. I can't even start to think about how I
would (unit) test such a beast but as of now I don't see any real
alternatives for scaling a Ruby app. Of course when done right it has
many many benefits. One is that you can buy the cheapest hardware and
plug them together like Google is doing it, but Google seems to have an
armada of excellent programmers (so it seems) to handle the pontentially
very difficult distributed stuff. I am only a single (maybe a little
over ambitious) programmer. I have seen a couple of very nice and simple
examples in drb (from Dave Thomas for example), but would you really
advise using drb for some big time commercial web app?
I would be very curious what kind of strategy 37signals has with their
Basecamp. Maybe David can elaborate on that if it is not a big secret. I
am very eager to hear about specific choices from anyone who has
similiar experience.
- What kind of hardware are you using?
- Where are the biggest performance bottlenecks in your environment?
- What is your hardware upgrade path in case active user numbers go
through the roof?
- What kind of httpd do you use?
- What kind of framework are you using?
Thanks