Using fork to conserve memory

D

Daniel DeLorme

Lately I've been bothered by the large start-up time and memory consumption of
rails (although this could apply to any large framework). The solution I have in
mind is to load rails in one master process (slow, high memory) and then fork
child processes (fast, most mem shared with parent) to be used by apache. Are
there any projects out there that do something like this? Or am I gonna have to
make it myself?

Daniel
 
J

Jan Svitok

Lately I've been bothered by the large start-up time and memory consumption of
rails (although this could apply to any large framework). The solution I have in
mind is to load rails in one master process (slow, high memory) and then fork
child processes (fast, most mem shared with parent) to be used by apache. Are
there any projects out there that do something like this? Or am I gonna have to
make it myself?

I'd say it should be possible to modify mongrel_cluster to do this
(maybe it even already does). I suppose that if you'll fork *after*
some processed requests, your memory gain will be better - as rails
uses lazy class loading - you want to do the fork after most of the
classes were loaded. OTOH, I guess that most of the memory is the
cache for objects, html snippets etc. and forking won't help you much
there - they cannot be shared.

Anyway, just try and see if it helps and what works best.
 
A

ara.t.howard

Lately I've been bothered by the large start-up time and memory consumption
of rails (although this could apply to any large framework). The solution I
have in mind is to load rails in one master process (slow, high memory) and
then fork child processes (fast, most mem shared with parent) to be used by
apache. Are there any projects out there that do something like this? Or am I
gonna have to make it myself?

you realize that this is __exactly__ what running rails, or any cgi, under
fastcgi does right?

-a
 
A

ara.t.howard

Lately I've been bothered by the large start-up time and memory consumption
of rails (although this could apply to any large framework). The solution I
have in mind is to load rails in one master process (slow, high memory) and
then fork child processes (fast, most mem shared with parent) to be used by
apache. Are there any projects out there that do something like this? Or am I
gonna have to make it myself?

Daniel

you may also be interested in this work i did some time ago

http://codeforpeople.com/lib/ruby/acgi/
http://codeforpeople.com/lib/ruby/acgi/acgi-0.1.0/README

quite similar in spirit.


-a
 
D

Daniel DeLorme

you realize that this is __exactly__ what running rails, or any cgi, under
fastcgi does right?

No, fastcgi creates a bunch of worker processes and loads the full rails
environment in EACH of them. That means a lot of memory (30+ MB for each
process) and a long initialization time (2+ seconds for each process).

What I'm talking about is loading the large environment once and THEN forking
off worker processes that don't need to go through the expensive initialization
sequence. It seems like an obvious idea and rails is not the only framework with
a large footprint, so *someone* must have done something like this already.

Daniel
 
M

M. Edward (Ed) Borasky

Daniel said:
No, fastcgi creates a bunch of worker processes and loads the full
rails environment in EACH of them. That means a lot of memory (30+ MB
for each process) and a long initialization time (2+ seconds for each
process).

What I'm talking about is loading the large environment once and THEN
forking off worker processes that don't need to go through the
expensive initialization sequence. It seems like an obvious idea and
rails is not the only framework with a large footprint, so *someone*
must have done something like this already.

Daniel
What about Mongrel? Isn't that the "fastest" web server for Ruby? How
does Mongrel's memory footprint compare with the others?

Incidentally, speaking of fast web servers, how much can be gained (on a
Linux platform, of course) in a Rails server with Mongrel and Apache by
using Tux? Zed, any idea?
 
R

Rob Sanheim

What about Mongrel? Isn't that the "fastest" web server for Ruby? How
does Mongrel's memory footprint compare with the others?

Incidentally, speaking of fast web servers, how much can be gained (on a
Linux platform, of course) in a Rails server with Mongrel and Apache by
using Tux? Zed, any idea?

--

Mongrel will use up plenty of memory, generally around 30 megs per
mongrel to start. That will grow with your app, of course. Most
people who have limited memory go with a much leaner web server then
apache, but mongrel is still the preferred choice for serving Rails.

I'm not sure how the OP imagines these child processes would work -
Rails doesn't really have any sort of threading model where it could
hand out work to the child processes. A lot of Rails isn't even
threadsafe AFAIK. Thats why all the current deployment recipes have
one full rails environment per mongrel instance/fastcgi process.

Rob
 
D

Daniel DeLorme

Rob said:
I'm not sure how the OP imagines these child processes would work -
Rails doesn't really have any sort of threading model where it could
hand out work to the child processes. A lot of Rails isn't even
threadsafe AFAIK. Thats why all the current deployment recipes have
one full rails environment per mongrel instance/fastcgi process.

I was talking about processes, not threads. The point is to use the
copy-on-write capabilities of the OS to share the footprint of all the
code that is loaded upfront while maintaining the independance of each
process. i.e. http://en.wikipedia.org/wiki/Copy-on-write

Daniel
 
K

khaines

I was talking about processes, not threads. The point is to use the
copy-on-write capabilities of the OS to share the footprint of all the code
that is loaded upfront while maintaining the independance of each process.
i.e. http://en.wikipedia.org/wiki/Copy-on-write

I experimented with this idea a little bit, with Mongrel, just doing some
proof of concept type work, some months ago. One should be able to alter
the Rails handler to fork a process to handle a request. However, there
are a couple caveats.

First, forking is not particularly fast.

Second, and more importantly, it presents difficulties when dealing with
database connections. The processes will share the same database handle.
That presents opportunities for simultaneous use of that handle in two
separate queries to possibly step on eachother, and it also means that if
one process closes that database handle (such as when it exits), that will
affect the other's database handle, as well.

Neither is necessarily a showstopper, and there are ways or working around
the db handle issues, so it may be worth some real experimentation.


Kirk Haines
 
A

ara.t.howard

No, fastcgi creates a bunch of worker processes and loads the full rails
environment in EACH of them. That means a lot of memory (30+ MB for each
process) and a long initialization time (2+ seconds for each process).

yes, i realize that. still, the concept is to start the workers before the
request comes in so startup time is minimized. it's true that memory is used
by each process but the number of processes can be configured.
What I'm talking about is loading the large environment once and THEN
forking off worker processes that don't need to go through the expensive
initialization sequence. It seems like an obvious idea and rails is not the
only framework with a large footprint, so *someone* must have done something
like this already.

that would be fairly difficult - consider that the all open file handles, db
connection, stdin, stdout, and stderr would be __shared__. for multiple
processes to all use them would be a disaster. in order to be able to fork a
rails process robustly one would need to track a huge number or resources and
de/re-allocate them in the child.

any decent kernel is going to share library pages in memory anyhow - they're
mmap'd in iff binary - and files share the page cache, so it's unclear to me
what advatage this would give? not that it's a bad idea, but it seems very
difficult to do in a generic way?

regards.

-a
 
A

ara.t.howard

I experimented with this idea a little bit, with Mongrel, just doing some
proof of concept type work, some months ago. One should be able to alter
the Rails handler to fork a process to handle a request. However, there are
a couple caveats.

First, forking is not particularly fast.

Second, and more importantly, it presents difficulties when dealing with
database connections. The processes will share the same database handle.
That presents opportunities for simultaneous use of that handle in two
separate queries to possibly step on eachother, and it also means that if
one process closes that database handle (such as when it exits), that will
affect the other's database handle, as well.

Neither is necessarily a showstopper, and there are ways or working around
the db handle issues, so it may be worth some real experimentation.

and you've just touched the tip of the iceberg: file locks are inherited, as
are stdio buffers, which can get flushed twice.

i wonder if people realize the size of a process is reported in virtual memory
and not in actual memory usage? in the case of something like rails tons of
the memory will be shared in the page cache.

cheers.

-a
 
N

Neil Wilson

I would just launch the relevant number of mongrel processes before
time and cluster them. I can't see the point in complicating matters
when memory is so cheap these days (Amazon EC2 launches give you
1.75Gb of RAM for 10 cents an hours).

NeilW
 
S

snacktime

why? can you elaborate?

Well there really aren't that many options to start with. Mongrel
works and is actively maintained. Most of the alternatives have bugs
that just never got fixed, even though some of them like fastcgi or
mod_ruby are far better choices design wise. The main problem with
mongrel isn't mongrel itself, but how you have to use it. You need a
cluster of mongrel processes running with a proxy server in front.
You have issues with proxy servers being able to detect hung mongrels
or sending more then one request at a time to a mongrel. The proxy
servers that can handle those things don't support ssl, and in general
it's just more work to setup and maintain then it could be.

Threads aren't an option because even if rails was thread safe, you
couldn't use C extensions because of blocking issues, which means most
of the database drivers. And rails makes heavy use of databases.

Chris
 
D

Daniel DeLorme

that would be fairly difficult - consider that the all open file handles, db
connection, stdin, stdout, and stderr would be __shared__. for multiple
processes to all use them would be a disaster. in order to be able to fork a
rails process robustly one would need to track a huge number or resources and
de/re-allocate them in the child.

Interesting. I had though about db connections but hadn't followed the reasoning
through to file descriptors and other shared resources. True this might be quite
tricky and problematic but, as khaines said, not necessarily a showstopper.
any decent kernel is going to share library pages in memory anyhow - they're

Indeed, I imagine (hope) that the code of a .so file would be shared between
processes. But I very much doubt the same holds true for .rb files. And I doubt
that compiled modules are more than a small fraction of the code.
mmap'd in iff binary - and files share the page cache, so it's unclear to me

Page cache... isn't that an entirely different topic? Access to shared data is
an open and shut case. Here I'm mostly interested in the CPU & memory cost of
the initialization phase, i.e. loading the *code*, not the data.
what advatage this would give? not that it's a bad idea, but it seems very
difficult to do in a generic way?

It may be difficult to do in a generic way, but the advantages seem obvious to
me. Hey, why tell when you can show? Please compare the behavior of:
require "/path/to/rails/app/config/environment.rb"
20.times do
break if Process.fork.nil?
end
sleep 10
vs:
20.times do
break if Process.fork.nil?
end
require "/path/to/rails/app/config/environment.rb"
sleep 10

and tell me which one you like better ;-)

Daniel
 
D

Daniel DeLorme

i wonder if people realize the size of a process is reported in virtual memory
and not in actual memory usage? in the case of something like rails tons of
the memory will be shared in the page cache.

Really? if I remember correctly, rails caching can work with a memory-store
which is not shared between processes, or a drb/memcached store which is not a
part of the process size.

But oh well, this isn't the rails list.

Daniel
 
D

Daniel DeLorme

First, forking is not particularly fast.

It's a LOT faster than loading the full rails environment though, and that's
what matters to me! :-D
Neither is necessarily a showstopper, and there are ways or working
around the db handle issues, so it may be worth some real experimentation.

Ok, so from the answers so far I gather that something like this hasn't really
been done before. Ara's acgi was quite similar to what I was hoping for, except
without the concurrent connections.

I Guess I'll just have to scratch my own itch.

Daniel
 
T

Tom Pollard

Interesting. I had though about db connections but hadn't followed
the reasoning through to file descriptors and other shared
resources. True this might be quite tricky and problematic but, as
khaines said, not necessarily a showstopper.

It seems like you need to figure out how easily a forked child can be
taught to release all of those shared handles and open its own handles.
Indeed, I imagine (hope) that the code of a .so file would be
shared between processes. But I very much doubt the same holds true
for .rb files. And I doubt that compiled modules are more than a
small fraction of the code.

The .rb files are just disk files - they probably only reside in
memory briefly, when they're initially parsed. What you're executing
is the binary ruby interpreter, running the parse tree generated from
those .rb files.

Tom
 
T

Thomas Hurst

* Daniel DeLorme ([email protected]) said:
No, fastcgi creates a bunch of worker processes and loads the full rails
environment in EACH of them. That means a lot of memory (30+ MB for each
process) and a long initialization time (2+ seconds for each process).

What I'm talking about is loading the large environment once and THEN
forking off worker processes that don't need to go through the expensive
initialization sequence.

I've been running simple Ruby FastCGI's using multiple forked processes
and external servers using something along the lines of:

shared_setup # load libs, logs are normally ok here too
STDIN.reopen(TCPServer.new(port))
trap("CHLD") { Process.wait(-1, Process::WNOHANG) }
process_count.times do
fork do
unshared_setup # db connection etc
FCGI.each_request {|r| process(r) }
end
end

libfcgi has helper functions to do much of this, be nice if they were
added to fcgi.so at some point.
 
A

ara.t.howard

Well there really aren't that many options to start with. Mongrel works
and is actively maintained. Most of the alternatives have bugs that just
never got fixed, even though some of them like fastcgi or mod_ruby are far
better choices design wise.

i thought most of those issue did not apply to linux: for instance the
fast_cgi max fds issue was bsd specific wasn't it? the reason i ask is that
i've used fastcgi for years on linux servers and never had issues.
The main problem with mongrel isn't mongrel itself, but how you have to use
it. You need a cluster of mongrel processes running with a proxy server in
front. You have issues with proxy servers being able to detect hung
mongrels or sending more then one request at a time to a mongrel. The proxy
servers that can handle those things don't support ssl, and in general it's
just more work to setup and maintain then it could be.
ah.


Threads aren't an option because even if rails was thread safe, you couldn't
use C extensions because of blocking issues, which means most of the
database drivers. And rails makes heavy use of databases.

yes of course.

thanks for the info.

-a
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,170
Messages
2,570,925
Members
47,466
Latest member
DrusillaYa

Latest Threads

Top