Using fork to conserve memory

Daniel DeLorme · Feb 3, 2007

Lately I've been bothered by the large start-up time and memory consumption of
rails (although this could apply to any large framework). The solution I have in
mind is to load rails in one master process (slow, high memory) and then fork
child processes (fast, most mem shared with parent) to be used by apache. Are
there any projects out there that do something like this? Or am I gonna have to
make it myself?

Daniel

Jan Svitok · Feb 3, 2007

Lately I've been bothered by the large start-up time and memory consumption of
rails (although this could apply to any large framework). The solution I have in
mind is to load rails in one master process (slow, high memory) and then fork
child processes (fast, most mem shared with parent) to be used by apache. Are
there any projects out there that do something like this? Or am I gonna have to
make it myself?

I'd say it should be possible to modify mongrel_cluster to do this
(maybe it even already does). I suppose that if you'll fork *after*
some processed requests, your memory gain will be better - as rails
uses lazy class loading - you want to do the fork after most of the
classes were loaded. OTOH, I guess that most of the memory is the
cache for objects, html snippets etc. and forking won't help you much
there - they cannot be shared.

Anyway, just try and see if it helps and what works best.

ara.t.howard · Feb 3, 2007

Lately I've been bothered by the large start-up time and memory consumption
of rails (although this could apply to any large framework). The solution I
have in mind is to load rails in one master process (slow, high memory) and
then fork child processes (fast, most mem shared with parent) to be used by
apache. Are there any projects out there that do something like this? Or am I
gonna have to make it myself?

you realize that this is __exactly__ what running rails, or any cgi, under
fastcgi does right?

-a

ara.t.howard · Feb 3, 2007

Lately I've been bothered by the large start-up time and memory consumption
of rails (although this could apply to any large framework). The solution I
have in mind is to load rails in one master process (slow, high memory) and
then fork child processes (fast, most mem shared with parent) to be used by
apache. Are there any projects out there that do something like this? Or am I
gonna have to make it myself?

Daniel

you may also be interested in this work i did some time ago

http://codeforpeople.com/lib/ruby/acgi/
http://codeforpeople.com/lib/ruby/acgi/acgi-0.1.0/README

quite similar in spirit.

-a

Daniel DeLorme · Feb 4, 2007

you realize that this is __exactly__ what running rails, or any cgi, under
fastcgi does right?

No, fastcgi creates a bunch of worker processes and loads the full rails
environment in EACH of them. That means a lot of memory (30+ MB for each
process) and a long initialization time (2+ seconds for each process).

What I'm talking about is loading the large environment once and THEN forking
off worker processes that don't need to go through the expensive initialization
sequence. It seems like an obvious idea and rails is not the only framework with
a large footprint, so *someone* must have done something like this already.

Daniel

M. Edward (Ed) Borasky · Feb 4, 2007

Daniel said:
No, fastcgi creates a bunch of worker processes and loads the full
rails environment in EACH of them. That means a lot of memory (30+ MB
for each process) and a long initialization time (2+ seconds for each
process).

What I'm talking about is loading the large environment once and THEN
forking off worker processes that don't need to go through the
expensive initialization sequence. It seems like an obvious idea and
rails is not the only framework with a large footprint, so *someone*
must have done something like this already.

Daniel

What about Mongrel? Isn't that the "fastest" web server for Ruby? How
does Mongrel's memory footprint compare with the others?

Incidentally, speaking of fast web servers, how much can be gained (on a
Linux platform, of course) in a Rails server with Mongrel and Apache by
using Tux? Zed, any idea?

Rob Sanheim · Feb 4, 2007

What about Mongrel? Isn't that the "fastest" web server for Ruby? How
does Mongrel's memory footprint compare with the others?

Incidentally, speaking of fast web servers, how much can be gained (on a
Linux platform, of course) in a Rails server with Mongrel and Apache by
using Tux? Zed, any idea?

--

Mongrel will use up plenty of memory, generally around 30 megs per
mongrel to start. That will grow with your app, of course. Most
people who have limited memory go with a much leaner web server then
apache, but mongrel is still the preferred choice for serving Rails.

I'm not sure how the OP imagines these child processes would work -
Rails doesn't really have any sort of threading model where it could
hand out work to the child processes. A lot of Rails isn't even
threadsafe AFAIK. Thats why all the current deployment recipes have
one full rails environment per mongrel instance/fastcgi process.

Rob

Daniel DeLorme · Feb 4, 2007

Rob said:
I'm not sure how the OP imagines these child processes would work -
Rails doesn't really have any sort of threading model where it could
hand out work to the child processes. A lot of Rails isn't even
threadsafe AFAIK. Thats why all the current deployment recipes have
one full rails environment per mongrel instance/fastcgi process.

I was talking about processes, not threads. The point is to use the
copy-on-write capabilities of the OS to share the footprint of all the
code that is loaded upfront while maintaining the independance of each
process. i.e. http://en.wikipedia.org/wiki/Copy-on-write

Daniel

khaines · Feb 4, 2007

I was talking about processes, not threads. The point is to use the
copy-on-write capabilities of the OS to share the footprint of all the code
that is loaded upfront while maintaining the independance of each process.
i.e. http://en.wikipedia.org/wiki/Copy-on-write

I experimented with this idea a little bit, with Mongrel, just doing some
proof of concept type work, some months ago. One should be able to alter
the Rails handler to fork a process to handle a request. However, there
are a couple caveats.

First, forking is not particularly fast.

Second, and more importantly, it presents difficulties when dealing with
database connections. The processes will share the same database handle.
That presents opportunities for simultaneous use of that handle in two
separate queries to possibly step on eachother, and it also means that if
one process closes that database handle (such as when it exits), that will
affect the other's database handle, as well.

Neither is necessarily a showstopper, and there are ways or working around
the db handle issues, so it may be worth some real experimentation.

Kirk Haines

ara.t.howard · Feb 4, 2007

No, fastcgi creates a bunch of worker processes and loads the full rails
environment in EACH of them. That means a lot of memory (30+ MB for each
process) and a long initialization time (2+ seconds for each process).

yes, i realize that. still, the concept is to start the workers before the
request comes in so startup time is minimized. it's true that memory is used
by each process but the number of processes can be configured.

What I'm talking about is loading the large environment once and THEN
forking off worker processes that don't need to go through the expensive
initialization sequence. It seems like an obvious idea and rails is not the
only framework with a large footprint, so *someone* must have done something
like this already.

that would be fairly difficult - consider that the all open file handles, db
connection, stdin, stdout, and stderr would be __shared__. for multiple
processes to all use them would be a disaster. in order to be able to fork a
rails process robustly one would need to track a huge number or resources and
de/re-allocate them in the child.

any decent kernel is going to share library pages in memory anyhow - they're
mmap'd in iff binary - and files share the page cache, so it's unclear to me
what advatage this would give? not that it's a bad idea, but it seems very
difficult to do in a generic way?

regards.

-a

ara.t.howard · Feb 4, 2007

I experimented with this idea a little bit, with Mongrel, just doing some
proof of concept type work, some months ago. One should be able to alter
the Rails handler to fork a process to handle a request. However, there are
a couple caveats.

First, forking is not particularly fast.

Second, and more importantly, it presents difficulties when dealing with
database connections. The processes will share the same database handle.
That presents opportunities for simultaneous use of that handle in two
separate queries to possibly step on eachother, and it also means that if
one process closes that database handle (such as when it exits), that will
affect the other's database handle, as well.

Neither is necessarily a showstopper, and there are ways or working around
the db handle issues, so it may be worth some real experimentation.

and you've just touched the tip of the iceberg: file locks are inherited, as
are stdio buffers, which can get flushed twice.

i wonder if people realize the size of a process is reported in virtual memory
and not in actual memory usage? in the case of something like rails tons of
the memory will be shared in the page cache.

cheers.

-a

ara.t.howard · Feb 4, 2007

mongrel is still the preferred choice for serving Rails.

why? can you elaborate?

-a

Neil Wilson · Feb 4, 2007

I would just launch the relevant number of mongrel processes before
time and cluster them. I can't see the point in complicating matters
when memory is so cheap these days (Amazon EC2 launches give you
1.75Gb of RAM for 10 cents an hours).

NeilW

snacktime · Feb 4, 2007

why? can you elaborate?

Well there really aren't that many options to start with. Mongrel
works and is actively maintained. Most of the alternatives have bugs
that just never got fixed, even though some of them like fastcgi or
mod_ruby are far better choices design wise. The main problem with
mongrel isn't mongrel itself, but how you have to use it. You need a
cluster of mongrel processes running with a proxy server in front.
You have issues with proxy servers being able to detect hung mongrels
or sending more then one request at a time to a mongrel. The proxy
servers that can handle those things don't support ssl, and in general
it's just more work to setup and maintain then it could be.

Threads aren't an option because even if rails was thread safe, you
couldn't use C extensions because of blocking issues, which means most
of the database drivers. And rails makes heavy use of databases.

Chris

Daniel DeLorme · Feb 4, 2007

that would be fairly difficult - consider that the all open file handles, db
connection, stdin, stdout, and stderr would be __shared__. for multiple
processes to all use them would be a disaster. in order to be able to fork a
rails process robustly one would need to track a huge number or resources and
de/re-allocate them in the child.

Interesting. I had though about db connections but hadn't followed the reasoning
through to file descriptors and other shared resources. True this might be quite
tricky and problematic but, as khaines said, not necessarily a showstopper.

any decent kernel is going to share library pages in memory anyhow - they're

Indeed, I imagine (hope) that the code of a .so file would be shared between
processes. But I very much doubt the same holds true for .rb files. And I doubt
that compiled modules are more than a small fraction of the code.

mmap'd in iff binary - and files share the page cache, so it's unclear to me

Page cache... isn't that an entirely different topic? Access to shared data is
an open and shut case. Here I'm mostly interested in the CPU & memory cost of
the initialization phase, i.e. loading the *code*, not the data.

what advatage this would give? not that it's a bad idea, but it seems very
difficult to do in a generic way?

It may be difficult to do in a generic way, but the advantages seem obvious to
me. Hey, why tell when you can show? Please compare the behavior of:
require "/path/to/rails/app/config/environment.rb"
20.times do
break if Process.fork.nil?
end
sleep 10
vs:
20.times do
break if Process.fork.nil?
end
require "/path/to/rails/app/config/environment.rb"
sleep 10

and tell me which one you like better ;-)

Daniel

Daniel DeLorme · Feb 4, 2007

i wonder if people realize the size of a process is reported in virtual memory
and not in actual memory usage? in the case of something like rails tons of
the memory will be shared in the page cache.

Really? if I remember correctly, rails caching can work with a memory-store
which is not shared between processes, or a drb/memcached store which is not a
part of the process size.

But oh well, this isn't the rails list.

Daniel

Daniel DeLorme · Feb 4, 2007

First, forking is not particularly fast.

It's a LOT faster than loading the full rails environment though, and that's
what matters to me! :-D

Neither is necessarily a showstopper, and there are ways or working
around the db handle issues, so it may be worth some real experimentation.

Ok, so from the answers so far I gather that something like this hasn't really
been done before. Ara's acgi was quite similar to what I was hoping for, except
without the concurrent connections.

I Guess I'll just have to scratch my own itch.

Daniel

Tom Pollard · Feb 4, 2007

Interesting. I had though about db connections but hadn't followed
the reasoning through to file descriptors and other shared
resources. True this might be quite tricky and problematic but, as
khaines said, not necessarily a showstopper.

It seems like you need to figure out how easily a forked child can be
taught to release all of those shared handles and open its own handles.

Indeed, I imagine (hope) that the code of a .so file would be
shared between processes. But I very much doubt the same holds true
for .rb files. And I doubt that compiled modules are more than a
small fraction of the code.

The .rb files are just disk files - they probably only reside in
memory briefly, when they're initially parsed. What you're executing
is the binary ruby interpreter, running the parse tree generated from
those .rb files.

Tom

Thomas Hurst · Feb 4, 2007

* Daniel DeLorme ([email protected]) said:
No, fastcgi creates a bunch of worker processes and loads the full rails
environment in EACH of them. That means a lot of memory (30+ MB for each
process) and a long initialization time (2+ seconds for each process).

What I'm talking about is loading the large environment once and THEN
forking off worker processes that don't need to go through the expensive
initialization sequence.

I've been running simple Ruby FastCGI's using multiple forked processes
and external servers using something along the lines of:

shared_setup # load libs, logs are normally ok here too
STDIN.reopen(TCPServer.new(port))
trap("CHLD") { Process.wait(-1, Process::WNOHANG) }
process_count.times do
fork do
unshared_setup # db connection etc
FCGI.each_request {|r| process(r) }
end
end

libfcgi has helper functions to do much of this, be nice if they were
added to fcgi.so at some point.

ara.t.howard · Feb 4, 2007

Well there really aren't that many options to start with. Mongrel works
and is actively maintained. Most of the alternatives have bugs that just
never got fixed, even though some of them like fastcgi or mod_ruby are far
better choices design wise.

i thought most of those issue did not apply to linux: for instance the
fast_cgi max fds issue was bsd specific wasn't it? the reason i ask is that
i've used fastcgi for years on linux servers and never had issues.

The main problem with mongrel isn't mongrel itself, but how you have to use
it. You need a cluster of mongrel processes running with a proxy server in
front. You have issues with proxy servers being able to detect hung
mongrels or sending more then one request at a time to a mongrel. The proxy
servers that can handle those things don't support ssl, and in general it's
just more work to setup and maintain then it could be.
ah.

Threads aren't an option because even if rails was thread safe, you couldn't
use C extensions because of blocking issues, which means most of the
database drivers. And rails makes heavy use of databases.

yes of course.

thanks for the info.

-a

How to use shared memory with fork() ?	5	Feb 17, 2012
Optimization tweak . Using fork as a "mark" and "release" heapmanager.	2	Mar 29, 2006
Using a Double-Fork in CGIC to Avoid Timeouts	1	Aug 9, 2006
[ANN] posix-spawn 0.3.0 -- first public release (codename, "tigersblood")	5	Mar 5, 2011
Using fork() on XP to run processes in parallel	8	May 2, 2005
5 Reasons why In-memory analysis matters	0	Jun 16, 2010
[ANN] tdb - Trivial Database bindings for Ruby	0	Dec 2, 2010
[ANN] test-loop 12.0.3	0	Apr 26, 2011

Using fork to conserve memory

Daniel DeLorme

Jan Svitok

ara.t.howard

ara.t.howard

Daniel DeLorme

M. Edward (Ed) Borasky

Rob Sanheim

Daniel DeLorme

khaines

ara.t.howard

ara.t.howard

ara.t.howard

Neil Wilson

snacktime

Daniel DeLorme

Daniel DeLorme

Daniel DeLorme

Tom Pollard

Thomas Hurst

ara.t.howard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads