On Fri, 2010-05-28, Jinxu Ding wrote:
[top-posting corrected]
Hi,
Sorry for the confusion.
I am not designing a job scheduler for MPI cluster.
I want to design an application program that can distribute computing
tasks on a cluster.
For example, I need to solve a big problem, which can be divided into
multiple small tasks.
These small tasks can be done independently on different nodes. So,
when they are done,
they return their results to one node, which combine their results
into a solution for the big problem.
Right now, the big problem has been programmed with C++. It is a
sequential algorithm.
I need to design a task/job scheduler that can do the above schduling
on cluster system so that the
small tasks can be run in parallel.
In this way, we can improve the performance of the sequential
algorithm.
So today you have one computer running one big sequential job?
Fixing that by writing your own general-purpose job scheduler
seems like overkill. Perhaps implementing real clustering is overkill,
too.
I can think of a number of techniques from traditional Unix usage,
which do not involve C++ programming.
- If the intermediate results are files, you can make the big program
a Makefile and let the make(1) utility do the full run.
- To use a single multi-CPU machine, run Gnu make as 'make -j' so things
which can be done in parallel are.
- Or use ssh(1) from within the Makefile to spread the work across
a known set of machines.
- Or use a make(1) extended to use multiple machines ("distcc"?)
- If the intermediate results are data streams, you can sometimes use Unix
pipelines, perhaps distributed using ssh:
ssh somewhere generate_data | process_data | process_data_more > result
(This would be off topic, except IMHO helping people to avoid
unneccessary C++ programming is on topic.)
/Jorgen