R
Ryan Tomayko
<https://github.com/rtomayko/posix-spawn>
$ gem install posix-spawn
tmm1 and I are pleased to announce the initial release of posix-spawn,
a small extension library that implements a subset of Ruby 1.9's new
Process::spawn [1] in a way that takes advantage of fast process
spawning (IEEE Std 1003.1 posix_spawn(2) systems interfaces [2]) where
available and runs on all MRI Rubys >= 1.8.7.
- Fast, constant time process spawning across a variety of platforms
- A largish compatible subset of Ruby 1.9's Process::spawn interface
as well as 1.9 enhancements to Kernel#system, Kernel#`, etc. under
Ruby >= 1.8.7.
- High level and hopefully portable POSIX::Spawn::Child class for
quick and dirty (but correct!) non-streaming IPC scenarios.
See the README for usage and graphs of benchmark results on Linux and
Darwin, or run them yourself:
$ uname -a
Linux aux1 2.6.26-2-xen-amd64 #1 SMP Thu Aug 20 2009 x86_64 GNU/Linux
$ ruby --version
ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]
$ gem install posix-spawn
$ posix-spawn-benchmark
benchmarking fork/exec vs. posix_spawn over 1000 runs at 100M res
user system total real
fspawn (fork/exec): 0.080000 14.920000 38.040000 ( 39.029493)
pspawn (posix_spawn): 0.040000 0.010000 0.560000 ( 0.939422)
Work on the library started when tmm1 found, through the use of his
brilliant rbtrace [3] program, a number of slow points in the GitHub
codebase where fork/exec is used heavily to spawn processes. In some
cases, a single fork() system call was using >30ms while in others
using only ~1ms. Our testsuite fork()'d especially slowly. Hmmm.
On Linux, fork(2) slows down as the parent process uses more memory
due to the need to copy page tables for COW. In many common uses of
fork(), where it is followed by one of the exec family of functions to
spawn child processes (Kernel#system, IO:open, Process::spawn,
etc.), this overhead can be removed by using posix_spawn() or vfork()
instead.
After implementing a simple fast process spawner extension using
posix_spawn() and gaining some familiarity with the posix_spawn family
of C functions, we noticed that it could potentially be used to
implement a large subset of features provided by Ruby 1.9's
Process::spawn.
We love Process::spawn.
We love Process::spawn so much in fact that over the past few months,
even before surfacing any of the issues with Linux fork() slowness, an
effort had been underway at GitHub to move two key libraries (Grit,
the Ruby interface to Git, and Albino, a Ruby wrapper around the
excellent Pygments syntax highlighter) to use Process::spawn
compatible method invocations (implemented with fork/exec under
Ruby 1.8.7) so that we could take advantage of Process::spawn under
Ruby 1.9.
Once we had a basic Process::spawn interface implemented on top of
posix_spawn(), we were able to take some higher level utility classes
from this work on the Grit and Albino projects and include them in
posix-spawn as a nice POSIX::Spawn::Child class. It is:
- Simple, requiring little code for simple stream input and capture
- Internally non-blocking (uses select(2)), so it handles all pipe
hang cases due to exceeding PIPE_BUF limits on one or more streams
- Potentially portable, due to the abstraction over lower-level
process and stream management APIs
We hope to now remove large bodies of Ruby 1.8.7 spawn emulation code
and replace it with posix-spawn.
As the project continued to take shape, we noticed how much more
feature-rich the Kernel#system, IO.popen, etc. methods were in Ruby
1.9. Having been built on the foundation of the new Process::spawn,
they allow for setting up the child's environment, redirecting
arbitrary fds, and all the other great stuff in Process::spawn. We
were able to write Ruby 1.8.7 compatible subset implementations of
those as well and put them under the POSIX::Spawn module.
Now, about that subset. As of this initial release, we were able to
implement the following arguments and options to spawn:
We have NOT yet implemented these options:
We have ideas for some of these pgroup, :umask, [:child, FD]) and
may implement them in future releases; others, like :rlimit, are not
supported by posix_spawn() and have no clear implementations strategy
outside of falling back to fork/exec when detected.
[0] https://github.com/rtomayko/posix-spawn
[1] http://www.ruby-doc.org/core-1.9/classes/Process.html#M002230
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/posix_spawn.html
[3] https://github.com/tmm1/rbtrace
Ryan Tomayko
Aman Gupta
$ gem install posix-spawn
tmm1 and I are pleased to announce the initial release of posix-spawn,
a small extension library that implements a subset of Ruby 1.9's new
Process::spawn [1] in a way that takes advantage of fast process
spawning (IEEE Std 1003.1 posix_spawn(2) systems interfaces [2]) where
available and runs on all MRI Rubys >= 1.8.7.
- Fast, constant time process spawning across a variety of platforms
- A largish compatible subset of Ruby 1.9's Process::spawn interface
as well as 1.9 enhancements to Kernel#system, Kernel#`, etc. under
Ruby >= 1.8.7.
- High level and hopefully portable POSIX::Spawn::Child class for
quick and dirty (but correct!) non-streaming IPC scenarios.
See the README for usage and graphs of benchmark results on Linux and
Darwin, or run them yourself:
$ uname -a
Linux aux1 2.6.26-2-xen-amd64 #1 SMP Thu Aug 20 2009 x86_64 GNU/Linux
$ ruby --version
ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]
$ gem install posix-spawn
$ posix-spawn-benchmark
benchmarking fork/exec vs. posix_spawn over 1000 runs at 100M res
user system total real
fspawn (fork/exec): 0.080000 14.920000 38.040000 ( 39.029493)
pspawn (posix_spawn): 0.040000 0.010000 0.560000 ( 0.939422)
Work on the library started when tmm1 found, through the use of his
brilliant rbtrace [3] program, a number of slow points in the GitHub
codebase where fork/exec is used heavily to spawn processes. In some
cases, a single fork() system call was using >30ms while in others
using only ~1ms. Our testsuite fork()'d especially slowly. Hmmm.
On Linux, fork(2) slows down as the parent process uses more memory
due to the need to copy page tables for COW. In many common uses of
fork(), where it is followed by one of the exec family of functions to
spawn child processes (Kernel#system, IO:open, Process::spawn,
etc.), this overhead can be removed by using posix_spawn() or vfork()
instead.
After implementing a simple fast process spawner extension using
posix_spawn() and gaining some familiarity with the posix_spawn family
of C functions, we noticed that it could potentially be used to
implement a large subset of features provided by Ruby 1.9's
Process::spawn.
We love Process::spawn.
We love Process::spawn so much in fact that over the past few months,
even before surfacing any of the issues with Linux fork() slowness, an
effort had been underway at GitHub to move two key libraries (Grit,
the Ruby interface to Git, and Albino, a Ruby wrapper around the
excellent Pygments syntax highlighter) to use Process::spawn
compatible method invocations (implemented with fork/exec under
Ruby 1.8.7) so that we could take advantage of Process::spawn under
Ruby 1.9.
Once we had a basic Process::spawn interface implemented on top of
posix_spawn(), we were able to take some higher level utility classes
from this work on the Grit and Albino projects and include them in
posix-spawn as a nice POSIX::Spawn::Child class. It is:
- Simple, requiring little code for simple stream input and capture
- Internally non-blocking (uses select(2)), so it handles all pipe
hang cases due to exceeding PIPE_BUF limits on one or more streams
- Potentially portable, due to the abstraction over lower-level
process and stream management APIs
We hope to now remove large bodies of Ruby 1.8.7 spawn emulation code
and replace it with posix-spawn.
As the project continued to take shape, we noticed how much more
feature-rich the Kernel#system, IO.popen, etc. methods were in Ruby
1.9. Having been built on the foundation of the new Process::spawn,
they allow for setting up the child's environment, redirecting
arbitrary fds, and all the other great stuff in Process::spawn. We
were able to write Ruby 1.8.7 compatible subset implementations of
those as well and put them under the POSIX::Spawn module.
Now, about that subset. As of this initial release, we were able to
implement the following arguments and options to spawn:
spawn([env,] command... [,options]) => pid
env: hash
name => val : set the environment variable
name => nil : unset the environment variable
command...:
command : /bin/sh -c 'command'
cmdname, arg1, ... : exec argv (no shell)
[cmdname, argv0], arg1, ... : exec argv (no shell)
options: hash
clearing environment variables:
:unsetenv_others => true : clear environment vars not in env
:unsetenv_others => false : don't clear (default)
redirection:
key:
FD : single fd in child process
[FD, FD, ...] : multiple fd in child process
value:
FD : redirect to fd in parent process
:close : close the fd in child process
string : redir w/ open(string, "r" or "w")
[string] : redir w/ open(string, File::RDONLY)
[string, open_mode] : redir w/ open(string, open_mode, 0644)
[string, open_mode, perm] : redir w/ open(string, open_mode, perm)
FD is one of follows
:in : the fd 0 which is the standard input
ut : the fd 1 which is the standard output
:err : the fd 2 which is the standard error
integer : the fd of specified the integer
io : the fd specified as io.fileno
current directory:
:chdir => str
We have NOT yet implemented these options:
options: hash
process group:
group => true or 0 : make a new process group
group => pgid : join to specified process group
group => nil : don't change the process group (default)
resource limit: resourcename is core, cpu, data, etc.
:rlimit_resourcename => limit
:rlimit_resourcename => [cur_limit, max_limit]
umask:
:umask => int
redirection:
value:
[:child, FD] : redirect to the redirected fd
file descriptor inheritance: close non-redir non-standard fds > 3
:close_others => false : inherit fds (default for system and exec)
:close_others => true : no inherit (default for spawn and popen)
We have ideas for some of these pgroup, :umask, [:child, FD]) and
may implement them in future releases; others, like :rlimit, are not
supported by posix_spawn() and have no clear implementations strategy
outside of falling back to fork/exec when detected.
[0] https://github.com/rtomayko/posix-spawn
[1] http://www.ruby-doc.org/core-1.9/classes/Process.html#M002230
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/posix_spawn.html
[3] https://github.com/tmm1/rbtrace
Ryan Tomayko
Aman Gupta