Good People,
I am honestly overwhelmed by the time you have each taken to respond.
I was/am completely unprepared! When I saw 9 messages in this thread,
I thought they all would be chiding me. I am pleasantly surprised by
your time and generosity.
Firstly, I need to sit down and really read the responses. I wanted to
post right away to allay any fears that I wouldn't respond in a timely
fashion. I just need a little time to digest and respond to each
poster as appropriate. I will do this.
Secondly, as you have all given this serious thought, I should refine
the requirement for you, as it may lead to a more interesting
discussion. I am certain that you have all basically "solved" the
problem (well, it's not really a problem, per se) already, so I am
posting a refinement _not_ to ask for further assistance. I am posting
more info FYI, if anything.
It is a real problem, and I am designing a real product. This is no
homework assignment, to be sure. That said, the purpose is to provide
work for grid nodes. I will have n number of clients/nodes asking the
server for work/tasks. I am centrally distributing all the server work
that can be done by grid nodes out to those nodes. It is that simple.
I used to work for a leader in the grid business, and I saw
opportunity in my product design to create/use a grid. That work gave
me insight as to how complex grid work/jobs can be described/managed.
Having such an appreciation also showed me a completely different way
to utilize grids, and this random weighting is only one of the ways I
want to do things differently that my past employer.
The work for the nodes is varied. It can be going out and scraping a
web page, doing a mathematical calculation, or parsing text against a
dictionary, for example.
"Priority" is a loose cover term for all the components of priority.
There is declared priority from the sender's (or task's) perspective
(only considers the task itself), and there is priority from the queue
manager's perspective (takes everything into account). The priority
then extends to the receiver/node, and finally the submission of the
results to the server/sender, and how those results are processed.
But this is only for "priority" in general. There are several types of
priority, each falling under the functional blocks/flow described
above. and yes, each and every priority can be adjusted en route.
Things I have to take into account include:
- Moving average of task type's completion
- Network lag to/from the server
- Network lag to/from the target (from the node)
- Node characteristics (think "health", as in CPU, RAM, I/O, etc.)
both as reported and when task is due for execution on node
- Time/Date task must not be returned later than
- Time/Date the task must wait for before starting
- Action upon expiration (delete, keep waiting, wait without priority
escalation, etc.)
- # of assignment retries on node and server
- # of error retries on node and server
- Task failure action(s)
- Predecessor tasks this task must wait for
- Preceding task this task needs to be executed immediately after
- Re-assignment options (move task waiting on one node to another
node)
- Execution requirements (CPU, RAM, I/O, idle time, etc.)
- All the priorities and metadata that apply for each node
(environmental, computational, etc.)
- ...and there are quite a few more
So as I see it, all of these factors, when combined with environmental
conditions, contribute to a priority. I believe it will be impossible
to assign one number to such a complex thing and call that # "The
Priority". Nor do I believe that a change in one metadata priority
will necessitate changing any/all others. Just dealing with this logic
between metadata requirements/restrictions is a job in itself.
So the methodology is as such:
A queue of work exists on the server. It is comprised of tasks. These
tasks are unassigned.
A process chooses appropriate tasks from the queue for inclusion into
a "Job", or collection of tasks, bound for a known, available, active
node. It is this process, and a few more, that I was asking about here
initially. The selection of tasks in a large queue, weighted by their
existing priority.
Anyway, that job is sent to the node for execution. The node iterates
through the job, doing each task. After a set period, it returns what
results it has, and continues until all tasks are completed. If any
tasks timeout, then existing rules determine how the situation is
handled. The node may, for example, retain the asynchronous jobs and
ask for another job, adding what it receives to the existing tasks.
When the job is returned to the server, the server saves the results,
and the process continues. I will say this, especially since threading
was mentioned. The server is J2EE.
Again, let me express my gratitude at your responses already. I will
prepare a more thorough response shortly.
Thanks!
pat