Config::CONFIG['SHELL'] - windows/*nix/mac

A

Ara.T.Howard

i'm working on a clustering system that runs jobs submitted to an nfs mounted
queue from n feeding nodes. currently it's linux only and i use the following
to ensure a users job is executed on the remote node with the environment they
are accustomed to

cmd = 'ls -ltar'

pipe = IO.pipe

unless((cid = fork))
pipe.last.close
STDIN.reopen pipe.first
exec 'bash --login'
else
pipe.first.close
pipe.last.puts cmd
end


therefore there job executes in a login shell and their 'normal' environment
is there. how would one go about this one windows? obviously the fork has to
go but i'm getting around that by opening up a pipe to another ruby process
using IO.popen and sending a little ruby program down the pipe in order to be
able to fork/exec the job. this is being done so i can make the child ruby
process do things like write me the pid back, redirect stdin/stdout, etc. -
all in a portable way...

thoughts?

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================
 
L

Lennon Day-Reynolds

Have you considered using DRb, instead of raw pipes, to coordinate the
work on Windows? Assuming this is more of a load-balancing system
than, say, a massively-parallel cluster environment, the overhead of
DRb marshalling and unmarshalling shouldn't be a big deal, and you
could probably just make your "front" object a simple job controller,
which could spawn processes, pass them input, and send output back to
the client.

Just a thought.
 
A

Ara.T.Howard

Have you considered using DRb, instead of raw pipes, to coordinate the
work on Windows? Assuming this is more of a load-balancing system
than, say, a massively-parallel cluster environment, the overhead of
DRb marshalling and unmarshalling shouldn't be a big deal, and you
could probably just make your "front" object a simple job controller,
which could spawn processes, pass them input, and send output back to
the client.

Just a thought.


my system has n feeding procsses 'competing' the process jobs from an nfs
mounted priority queue. this obviously involves some sort of nfs db and/or
locking. this is all provided by sqlite and some other classes i've written
(lockfile on raa). the advantages this has are

- no single point of failure. if one node stay up the sytem continues

- no networking needed (well NFS but that hardly counts). this is important
because it means not ports - and that means no sysads. one thing i am
striving for is that a user should be able to set a cluster up by simply
running a peice of userland code pointing at an nfs mounted directory in
under five minutes. the niche this aims for is something less complicated
that sun grid engine, or other systems which use daemons to communicate
jobs and to schedule them, and something more that simply spawn jobs by
spawning ssh sessions all over the place. to my knowledge there is no
such system. and it's tremendously useful in a scientific setting where
one often just wants to throw 30 nodes at a list of jobs right NOW.

i considered drb for a really long time and it has the following disadvantages
that i can see

- must open ports. since sept. 11th. we have only ssh. period. ssh
tunneling is an options but absolutely crazy when once starts considering
how to keep ssh-agent running across reboots (we must use passpharases)
without embedding passwords (forbidden and checked for here). plus the
number of ssh tunnels needed is n^2 - this gets riduculous when you have
30 nodes!

- if you have a scheduler you have a single point of failure. if all nodes
can operate as the scheduler you need some sort of distributed locking
protocl. you could use the filesystem and nfs safe locks here. if you
have nfs safe locks you do not need drb and can simply put the queue in an
nfs safe db (sqlite) and coordiante all actions via the filesystem.

of course, you could start using something like a tuple space to
coordinate - but again you have a single point of failure...

i cannot see how one can either

- elimnate a single point of failure using drb

- make the system decentralized (all daemons are servants) without
requiring some form of locking - thereby eliminating the need for drb
in the first place

- code like it is already written - condor, sge (sun grid engine) and they
have LOTS of problems. scheduling is tough.

if you have suggestions i'm all ears.

also, i should point out that virtually every scienfic cluster in our building
already relies on nfs and locking to some degree so while it's true that the
nfs server itself is a single point of failure (and network of course) this
things are already inherent in the system and my code adds no MORE points of
failure.

the system must run in the face of problems or i come in on weekends! ;-( so
i will not willingly introduce single points of failure into the system. the
present only require only that syads come in on the weekend - not me - so i'd
like to keep it that way.


in short i would LOVE to use drb for many reasons, but cannot come up with a
fault tolerant way to deal with ssh tunneling, scheduling, and locking that
does make nfs mounted work queues a simpler solution in the process.

thoughts?

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,737
Latest member
Georgeengab

Latest Threads

Top