H
half.italian
Hi all! I'm implementing one of my first multithreaded apps, and have
gotten to a point where I think I'm going off track from a standard
idiom. Wondering if anyone can point me in the right direction.
The script will run as a daemon and watch a given directory for new
files. Once it determines that a file has finished moving into the
watch folder, it will kick off a process on one of the files. Several
of these could be running at any given time up to a max number of
threads.
Here's how I have it designed so far. The main thread starts a
Watch(threading.Thread) class that loops and searches a directory for
files. It has been passed a Queue.Queue() object (watch_queue), and
as it finds new files in the watch folder, it adds the file name to
the queue.
The main thread then grabs an item off the watch_queue, and kicks off
processing on that file using another class Worker(threading.thread).
My problem is with communicating between the threads as to which files
are currently processing, or are already present in the watch_queue so
that the Watch thread does not continuously add unneeded files to the
watch_queue to be processed. For example...Watch() finds a file to be
processed and adds it to the queue. The main thread sees the file on
the queue and pops it off and begins processing. Now the file has
been removed from the watch_queue, and Watch() thread has no way of
knowing that the other Worker() thread is processing it, and shouldn't
pick it up again. So it will see the file as new and add it to the
queue again. PS.. The file is deleted from the watch folder after it
has finished processing, so that's how i'll know which files to
process in the long term.
I made definite progress by creating two queues...watch_queue and
processing_queue, and then used lists within the classes to store the
state of which files are processing/watched.
I think I could pull it off, but it has got very confusing quickly,
trying to keep each thread's list and the queue always in sync with
one another. The easiset solution I can see is if my threads could
read an item from the queue without removing it from the queue and
only remove it when I tell it to. Then the Watch() thread could then
just follow what items are on the watch_queue to know what files to
add, and then the Worker() thread could intentionally remove the item
from the watch_queue once it has finished processing it.
Now that I'm writing this out, I see a solution by over-riding or
wrapping Queue.Queue().get() to give me the behavior I mention above.
I've noticed .join() and .task_done(), but I'm not sure of how to use
them properly. Any suggestions would be greatly appreciated.
~Sean
gotten to a point where I think I'm going off track from a standard
idiom. Wondering if anyone can point me in the right direction.
The script will run as a daemon and watch a given directory for new
files. Once it determines that a file has finished moving into the
watch folder, it will kick off a process on one of the files. Several
of these could be running at any given time up to a max number of
threads.
Here's how I have it designed so far. The main thread starts a
Watch(threading.Thread) class that loops and searches a directory for
files. It has been passed a Queue.Queue() object (watch_queue), and
as it finds new files in the watch folder, it adds the file name to
the queue.
The main thread then grabs an item off the watch_queue, and kicks off
processing on that file using another class Worker(threading.thread).
My problem is with communicating between the threads as to which files
are currently processing, or are already present in the watch_queue so
that the Watch thread does not continuously add unneeded files to the
watch_queue to be processed. For example...Watch() finds a file to be
processed and adds it to the queue. The main thread sees the file on
the queue and pops it off and begins processing. Now the file has
been removed from the watch_queue, and Watch() thread has no way of
knowing that the other Worker() thread is processing it, and shouldn't
pick it up again. So it will see the file as new and add it to the
queue again. PS.. The file is deleted from the watch folder after it
has finished processing, so that's how i'll know which files to
process in the long term.
I made definite progress by creating two queues...watch_queue and
processing_queue, and then used lists within the classes to store the
state of which files are processing/watched.
I think I could pull it off, but it has got very confusing quickly,
trying to keep each thread's list and the queue always in sync with
one another. The easiset solution I can see is if my threads could
read an item from the queue without removing it from the queue and
only remove it when I tell it to. Then the Watch() thread could then
just follow what items are on the watch_queue to know what files to
add, and then the Worker() thread could intentionally remove the item
from the watch_queue once it has finished processing it.
Now that I'm writing this out, I see a solution by over-riding or
wrapping Queue.Queue().get() to give me the behavior I mention above.
I've noticed .join() and .task_done(), but I'm not sure of how to use
them properly. Any suggestions would be greatly appreciated.
~Sean