C
Christophe Vandeplas
Hello,
I have a question about deque and thread-safety.
My application has multiple threads running concurrently and doing the
same action (downloading pages)
To know what has already been downloaded I created the variable:
seen = deque('', 1000) (keeps list of max 1000 urls in memory)
In one place of the code I do: seen.append(url)
And another place:
def seenPage()
if url in seen:
return True
return False
It seems that appending to deques is indeed thread-safe, but not
iterating over them.
I've seen a similar, but different, situation here:
http://comments.gmane.org/gmane.comp.python.devel/85487
Forgetting the above url, and considering my situation this behavior
screws up the concept I need:
- Keeping a thread-safe collection of seen urls,
- Being able to check if something is in that collection
- No need to clean the collection to prevent the memory from filling up
So I know I could work around this problem by using a lock.
But then I don't only need to use the lock around the iterator, but
also around the append(), but that defeats the purpose of deque being
thread-safe.
In short, what's your advice:
1/ build a lock around the .append() and the iterator. Using the
already-existing lock in the deque. But HOW?
1/ simply build a lock around the .append() and the iterator.
Defeating the build-in thread-safety.
2/ use another collection that does what I need
Thanks for your expertise.
Christophe
I have a question about deque and thread-safety.
My application has multiple threads running concurrently and doing the
same action (downloading pages)
To know what has already been downloaded I created the variable:
seen = deque('', 1000) (keeps list of max 1000 urls in memory)
In one place of the code I do: seen.append(url)
And another place:
def seenPage()
if url in seen:
return True
return False
From the documentation I understand that deques are thread-safe:
Deques are a generalization of stacks and queues (the name is pronounced “deckâ€
and is short for “double-ended queueâ€). Deques support thread-safe, memory
efficient appends and pops from either side of the deque with approximately the
same O(1) performance in either direction.
It seems that appending to deques is indeed thread-safe, but not
iterating over them.
I've seen a similar, but different, situation here:
http://comments.gmane.org/gmane.comp.python.devel/85487
Forgetting the above url, and considering my situation this behavior
screws up the concept I need:
- Keeping a thread-safe collection of seen urls,
- Being able to check if something is in that collection
- No need to clean the collection to prevent the memory from filling up
So I know I could work around this problem by using a lock.
But then I don't only need to use the lock around the iterator, but
also around the append(), but that defeats the purpose of deque being
thread-safe.
In short, what's your advice:
1/ build a lock around the .append() and the iterator. Using the
already-existing lock in the deque. But HOW?
1/ simply build a lock around the .append() and the iterator.
Defeating the build-in thread-safety.
2/ use another collection that does what I need
Thanks for your expertise.
Christophe