Coming in late...
<snip>
Note: double-leading __ means "name mangling" -- typically only
needed when doing multiple layers of inheritance where different parents
have similar named items that need to be kept independent; a single _ is
the convention for "don't touch me unless you know what you are doing"
threading.Thread.__init__ ( self )
def run ( self ):
while 1:
item = self.__queue.get()
if item!=None:
model = domain.get_item(item[0])
logger.debug('sdbthread item:'+item[0])
title = model['title']
scraped = model['scraped']
logger.debug("sdbthread title:"+title)
any suggestions?
thanks
said:
thanks John, Gabriel,
here's the 'put' side of the requests:
def prepSDBSearch(results):
modelList = [0]
counter=1
for result in results:
data = [result.item, counter, modelList]
queue.put(data)
counter+=1
while modelList[0] < len(results):
print 'waiting...'#wait for them to come home
modelList.pop(0)#now remove '0'
return modelList
My suggestion, if you really want diagnostic help -- follow the
common recommendation of posting the minimal /runable (if erroneous)/
code... If "domain.get_item()" is some sort of RDBM access, you might
fake it using a pre-loaded dictionary -- anything that allows it to
return something when given the key value.
responses to your follow ups:
1) 'item' in thethreadsis a list that corresponds to the 'data'
list in the above function. it's not global, and the initial values
seem ok, but i'm not sure if every time i pass in data to the queue it
passes in the same memory address or declares a new 'data' list (which
I guess is what I want)
Rather confusing usage... In your "put" you have a list whose first
element is "result.item", but then in the work thread, you refer to the
entire list as "item"
3) the first item in the modelList is a counter that keeps track of
the number ofthreadsfor this call that have completed - is there any
better way of doing this?
Where? None of your posted code shows either "counter" or modelList
being used by thethreads.
And yes, if you havethreadstrying to update a shared mutable, you
have a race condition.
You also have a problem if you are using "counter" to define where
in modelList a thread is supposed to store its results -- as you can not
access an element that doesn't already exist...
a = [0]
a[3] = 1 #failure, need to create elements 1, 2, 3 first
Now, if position is irrelevant, and a thread just appends its
results to modelList, then you don't need some counter, all you need is
to check the length of modelList against the count expected.
Overall -- even though you are passing things via the queue, the
contents being pass via the queue are being treated as if they were
global entities (you could make modelList a global, remove it from the
queue entries, and have the same net access)...
IOWs, you have too much coupling between thethreadsand the feed
routine...
As for me... I'd be using a second queue for return values....
WORKERTHREADS = 100
feed = Queue.Queue()
result = Queue.Queue()
def worker():
while True:
(ID, item) = feed.get() #I leave the queues globals
#since they perform locking
#internally
model = domain.get_item(item)
results.put( (ID, model["title"], model["scraped"]) )
for i in range(WORKERTHREADS):
aThread = threading.Thread(target=worker)
#overkill to subclass as there is now no specialized init
#and if I really wanted to make the queues non-global
#I'd pass them as arguments:
# threading.Thread(target=worker, args=(feed, results))
#where worker is now
# def worker(feed, results):
aThread.setDaemon(True)
aThread.start()
...
def prepSearch(searches):
modelList = []
counter = 0
for searchItem in searches:
feed.put( (counter, searchItem) )
counter += 1
modelList.append(None) #extend list one element per search
while counter:
(ID, title, scraped) = results.get()
modelList[ID] = (title, scraped)
counter -= 1
return modelList
The only place counter and modelList are modified are within the
prepSearch. I'm passing counter out and back to use as an ID value if
the final results are supposed to be in order -- that way if one thread
finishes before another, the items can be placed into the list where
they'd have been sequentially.
I can only hope that "domain.get_item" is an activity that is I/O
bound AND that it supports parallel accesses... Otherwise the above
workerthreadsseem to be adding a lot of overhead for queue I/O and
threading swaps for what is otherwise a rather linear process.
Perhaps your posts don't reveal enough... Maybe you have multiple
mainthreadsthat are posting to the worker feed queue (and you were
using separate mutables for storing the results). In this situation, I'd
remove the results queue from being a global entity, create one queue
per main processing thread, and pass the queue as one of the parameters.
This way, a worker can return data to any source thread by using the
supplied queue for the return...
Modify prepSearch with:
myQueue = Queue.Queue()
...
feed.put( (counter, searchItem, myQueue) )
...
(ID, title, scraped) = myQueue.get()
Modify worker with:
(ID, item, retqueue) = feed.get()
...
retqueue.put( (ID, model["title"], model["scraped"]) )
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/