Threads and Progress Bar

  • Thread starter Ritesh Raj Sarraf
  • Start date
R

Ritesh Raj Sarraf

Hi,

I have a small application, written in Python, that uses threads.
The application uses function foo() to download files from the web. As it reads
data from the web server, it runs a progress bar by calling an install of a
progress bar class.

When using threads, I get the problem that the progress bar gets over-written by
the download progress of files from other threads.

I believe my change has to go into the progress bar class to make it thread
aware.

Are they any docs/suggestions on how to implement progress bars along with
threads ?

Thanks,
Ritesh
--
Ritesh Raj Sarraf
RESEARCHUT - http://www.researchut.com
"Necessity is the mother of invention."
"Stealing logic from one person is plagiarism, stealing from many is research."
"The great are those who achieve the impossible, the petty are those who
cannot - rrs"
 
D

Dennis Lee Bieber

Hi,

I have a small application, written in Python, that uses threads.
The application uses function foo() to download files from the web. As it reads
data from the web server, it runs a progress bar by calling an install of a
progress bar class.

When using threads, I get the problem that the progress bar gets over-written by
the download progress of files from other threads.

I believe my change has to go into the progress bar class to make it thread
aware.

Are they any docs/suggestions on how to implement progress bars along with
threads ?
Well... first off -- some minimal code would be of use...

Second... It sounds like you only created one progress bar, and each
thread is referencing that single bar. I'd suspect you need to create a
bar for EACH thread you create, and tell the thread which bar to update.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
R

Ritesh Raj Sarraf

Dennis said:
Well... first off -- some minimal code would be of use...

I was scared that people might feel that I'm asking for a ready-made
solution. :)

Here's the code.

The is the progress bar code.
progressbar.py
class progressBar:
def __init__(self, minValue = 0, maxValue = 10, totalWidth=12):
self.progBar = "[]" # This holds the progress bar string
self.min = minValue
self.max = maxValue
self.span = maxValue - minValue
self.width = totalWidth
self.amount = 0 # When amount == max, we are 100% done
self.updateAmount(0) # Build progress bar string

def updateAmount(self, newAmount = 0):
if newAmount < self.min: newAmount = self.min
if newAmount > self.max: newAmount = self.max
self.amount = newAmount

# Figure out the new percent done, round to an integer
diffFromMin = float(self.amount - self.min)
percentDone = (diffFromMin / float(self.span)) * 100.0
percentDone = round(percentDone)
percentDone = int(percentDone)

# Figure out how many hash bars the percentage should be
allFull = self.width - 2
numHashes = (percentDone / 100.0) * allFull
numHashes = int(round(numHashes))

# build a progress bar with hashes and spaces
self.progBar = "[" + '#'*numHashes + ' '*(allFull-numHashes) + "]"

# figure out where to put the percentage, roughly centered
percentPlace = (len(self.progBar) / 2) - len(str(percentDone))
percentString = str(percentDone) + "%"

# slice the percentage into the bar
self.progBar = self.progBar[0:percentPlace] + percentString +
self.progBar[percentPlace+len(percentString):] \
+ " " + str(newAmount/1024) + "KB of " + str(self.max/1024) + "KB"

def __str__(self):
return str(self.progBar)

def myReportHook(count, blockSize, totalSize):
import sys
global prog
prog = ""

if prog == "":
prog = progressBar(0,totalSize,50)
prog.updateAmount(count*blockSize)
sys.stdout.write (str(prog))
sys.stdout.write ("\r")
#print count * (blockSize/1024) , "kb of " , (totalSize/1024) , "kb
downloaded.\n"


Here's the function, download_from_web() which calls the progress bar:
main.py
def download_from_web(sUrl, sFile, sSourceDir, checksum):

try:
block_size = 4096
i = 0
counter = 0

os.chdir(sSourceDir)
temp = urllib2.urlopen(sUrl)
headers = temp.info()
size = int(headers['Content-Length'])
data = open(sFile,'wb')

log.msg("Downloading %s\n" % (sFile))
while i < size:
data.write (temp.read(block_size))
i += block_size
counter += 1
progressbar.myReportHook(counter, block_size, size)
print "\n"
data.close()
temp.close()


And since I later implemented threads, multiple threads call download_from_web()
concurrently, which in effect calls progress bar, thus I get a progress bar
which continuously keeps getting overwritten. :)

Here's the code where multiple threads execute:

try:
lRawData = open(uri, 'r').readlines()
except IOError, (errno, strerror):
log.err("%s %s\n" % (errno, strerror))
errfunc(errno, '')


#INFO: Mac OS is having issues with Python Threading.
# Use the conventional model for Mac OS
if sys.platform == 'darwin':
log.verbose("Running on Mac OS. Python doesn't have proper support
for Threads on Mac OS X.\n")
log.verbose("Running in the conventional non-threaded way.\n")
for each_single_item in lRawData:
(sUrl, sFile, download_size, checksum) =
stripper(each_single_item)

if download_from_web(sUrl, sFile, sSourceDir, None) != True:
#sys.stderr.write("%s not downloaded from %s\n" % (sFile,
sUrl))
#sys.stderr.write("%s failed\n\n" % (sFile))
variables.errlist.append(sFile)
pass
else:
if zip_bool:
compress_the_file(zip_type_file, sFile, sSourceDir)
os.remove(sFile) # Remove it because we don't need the
file once it is zipped.
else:
#INFO: Thread Support
if variables.options.num_of_threads > 1:
log.msg("WARNING: Threads is still in alpha stage. It's better
to use just a single thread at the moment.\n")
log.warn("Threads is still in alpha stage. It's better to use
just a single thread at the moment.\n")

NUMTHREADS = variables.options.num_of_threads
name = threading.currentThread().getName()
ziplock = threading.Lock()

def run(request, response, func=download_from_web):
'''Get items from the request Queue, process them
with func(), put the results along with the
Thread's name into the response Queue.

Stop running once an item is None.'''

while 1:
item = request.get()
if item is None:
break
(sUrl, sFile, download_size, checksum) = stripper(item)
response.put((name, sUrl, sFile, func(sUrl, sFile,
sSourceDir, None)))

# This will take care of making sure that if downloaded,
they are zipped
(thread_name, Url, File, exit_status) = responseQueue.get()
if exit_status == True:
if zip_bool:
ziplock.acquire()
try:
compress_the_file(zip_type_file, File,
sSourceDir)
os.remove(File) # Remove it because we don't
need the file once it is zipped.
finally:
ziplock.release()
else:
variables.errlist.append(File)
pass

# Create two Queues for the requests and responses
requestQueue = Queue.Queue()
responseQueue = Queue.Queue()

# Pool of NUMTHREADS Threads that run run().
thread_pool = [
threading.Thread(
target=run,
args=(requestQueue, responseQueue)
)
for i in range(NUMTHREADS)
]

# Start the threads.
for t in thread_pool: t.start()

# Queue up the requests.
for item in lRawData: requestQueue.put(item)

# Shut down the threads after all requests end.
# (Put one None "sentinel" for each thread.)
for t in thread_pool: requestQueue.put(None)

# Don't end the program prematurely.
#
# (Note that because Queue.get() is blocking by
# defualt this isn't strictly necessary. But if
# you were, say, handling responses in another
# thread, you'd want something like this in your
# main thread.)
for t in thread_pool: t.join()

Second... It sounds like you only created one progress bar, and each
thread is referencing that single bar. I'd suspect you need to create a
bar for EACH thread you create, and tell the thread which bar to update.

Yes, you're correct. That's what I'm also suspecting. I tried to do some minor
changes but couldn't succeed.
Request you to, if you reply with code, give a little explanation so that I can
understand and learn from it.

Thanks,
Ritesh
--
Ritesh Raj Sarraf
RESEARCHUT - http://www.researchut.com
"Necessity is the mother of invention."
"Stealing logic from one person is plagiarism, stealing from many is research."
"The great are those who achieve the impossible, the petty are those who
cannot - rrs"
 
D

Dennis Lee Bieber

class progressBar:
def __init__(self, minValue = 0, maxValue = 10, totalWidth=12):
self.progBar = "[]" # This holds the progress bar string

Why "hold a string" -- since strings are immutable in Python, every
time you modify it you have to create a whole new string.
def __str__(self):
return str(self.progBar)

You don't need str() for a string data type. But why not compute the
string here, so you only need a local form /when/ needed.
def myReportHook(count, blockSize, totalSize):

Some generic function... based on the indentation, it is not a
method of the progress bar class... apparently being called by EACH
thread
import sys
global prog
prog = ""

if prog == "":
prog = progressBar(0,totalSize,50)

What have we here?

First you are declaring that "prog" is a global, which also makes it
shared... You initialize it to an empty string, and then you test for it
to be empty... IF it is empty (well, it will always be empty since you
initialized it) you create a progress bar INSTANCE (which is not a
string, BTW).
prog.updateAmount(count*blockSize)

You then update it based upon some arguments...
sys.stdout.write (str(prog))
sys.stdout.write ("\r")

Then you write its string form out. Using two I/O calls (which means
a thread swap could take place between them).
Here's the function, download_from_web() which calls the progress bar:
progressbar.myReportHook(counter, block_size, size)

This should fail, since 1) the indentation meant myReportHook() is
NOT a method of progressbar and, 2) you are invoking it as a class
method, not an instance method. The only progress bar instance I've seen
in the entire listing is the one you create /inside/ of myReportHook(),
and you keep recreating that one from scratch each time you call.
Yes, you're correct. That's what I'm also suspecting. I tried to do some minor
changes but couldn't succeed.
Request you to, if you reply with code, give a little explanation so that I can
understand and learn from it.

Don't know how much you can get from this -- I stripped out all the
network stuff and fake it with randomly generated sleeps, packet sizes,
and file sizes (no real files, just sizes to play with). Watch out for
line wraps...

-=-=-=-=-=-=-=-
import random
import threading
import time
import Queue

class ProgressBar(object):
def __init__(self, minValue = 0, maxValue = 10, width = 10):
#width does NOT include the two places for [] markers
self.min = minValue
self.max = maxValue
self.span = float(self.max - self.min)
self.width = width
self.value = self.min
def updateValue(self, newValue):
#require caller to supply a value!
self.value = max(self.min, min(self.max, newValue))
def __str__(self):
#compute display fraction
percentFilled = ((self.value - self.min)
/ self.span)
widthFilled = int(self.width * percentFilled + 0.5)
return ("[" + "#"*widthFilled + " "*(self.width - widthFilled) +
"]"
+ " %5.1f%% of %6s" % (percentFilled * 100.0, self.max))


def downloadFromWeb(URL):
#this is a dummy routine which merely uses random number
#generation to determine "size" and rate of reception of
#a file transfer
size = random.randint(2**10, 2**15)
#"file size" is between 1,024 bytes and 32,767 bytes
packet = random.randint(10, 10 + (size / 64))
#"packet size is anywhere from 10 bytes to ...
latency = random.randint(1, 30) / 10.0
#"network latency" is between 0.1 and 3.0 seconds per packet
#create a progress bar instance
myProgress = ProgressBar(maxValue = size, width = 30)
print "\nDownloading %s with size, packet, latency %6s, %5s, %5.2f "
% (URL, size, packet, latency)
print "%20s: %s \r" % (URL, str(myProgress)),
for i in xrange(0, size, packet):
time.sleep(latency)
myProgress.updateValue(i)
print "%20s: %s \r" % (URL, str(myProgress)),
myProgress.updateValue(size)
print "%20s: %s \r" % (URL, str(myProgress)),
print "\n%20s: COMPLETED" % URL
completionQueue.put(URL)


#create queue for completion notification
completionQueue = Queue.Queue()

if __name__ == "__main__":
#create a random number (between 5 and 20) of download "URL"s
URLs = [ "File Number = %3s" % i
for i in xrange(random.randint(5, 20)) ]
for U in URLs:
#create "download" thread for each URL
threading.Thread(target=downloadFromWeb,
args=(U,)).start()

while URLs:
#loop while any of the "URL"s are still being processed
#note use of blocking get() call; only proceed when
#data is available
CU = completionQueue.get()
URLs.remove(CU)
-=-=-=-=-=-=-=-=-

If you try running this, be advised: you can not break execution --
it runs until complete and depending on how the random numbers fall,
that could take some time. (If you do manage to kill a thread, the main
routine will never exit as it never receives the "completed" message for
that thread).

Part one: the ProgressBar class... Just the __init__(),
updateValue(), and __str__() methods.

Part two: the fake downloadFromWeb() function. This takes one
argument (in my test version) -- called a URL, but it is just a /unique/
string (basically, the thread number with a text addition). This gets
reported later with the progress bar AND is the text of the "completed"
message that gets queued for the main thread. Generate random numbers
for size, packet, latency. CREATE A PROGRESS BAR INSTANCE with the
specified size and some display width (not counting the []). Print out
(on a new line) the information of this function invocation -- the URL,
and size, packet, latency. Then print out the URL and progress bar
(using a trailing , to prevent a new line). Go into a loop from 0 to
size, stepping by the packet length. In the loop, sleep for latency (to
emulate network load delays in receiving a packet), update the progress
bar value to the current loop value, display the URL/progress bar (again
with , to prevent line feed). At end of loop, update progress bar to
size (100%) and display it, then display a completed message on a new
line. Finally, queue the URL so the main program can tell that this
thread has completed (function exits, so thread exits).

Part three: create the completion queue (I don't have to declare it
global in part two because I'm not going to rebind it, only use methods
that it has).

Part four: main program. Create a list of between 5 and 20 "URL"s;
then for each URL create and start a thread (Note that I'm NOT keeping a
list of each thread object). After all the threads have been started and
the list of URLs is NOT empty loop: get a value from the completion
queue (this blocks until some data is put into the queue). Then, since
the only data that should be placed in the queue is a URL that came from
the URL list, remove the returned URL. When all threads have completed,
the URL list is empty, and the main program exits.

Note: while each thread has its own progress bar, the fact that they
display the updates with a new line means that each thread /displays/ on
the same screen line (except when the initialization puts out new lines,
and when the completions put out new lines). If you want each thread to
put the progress bar on a different line of the screen, you will have to
find some sort of cursor control code (I don't recall if curses is
available for Windows) and have each thread embed the cursor position
into the string that gets written to the screen (you do NOT want to
split that I/O into separate calls as a thread swap could take place and
relocate the cursor...
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
D

Dennis Lee Bieber

A variation -- windows specific -- that tries to display each
thread's progress bar on a separate line.

Since I don't know of a working cursor positioning system I have
added a screen thread and queue. The screen thread keeps a dictionary of
the URLs and progress bars received via the queue. On each receive, it
updates the dictionary, then uses os.system("cls") to clear the screen,
and prints each url/progress bar (in URL sorted order) from the
dictionary.

Note that the dictionary never shrinks as I have it coded (it shouldn't
be difficult to add logic to remove a dictionary entry based upon
sending, say, the URL and None for the progress bar string).

This one isn't too perplexing to watch

-=-=-=-=-=-=-
import random
import threading
import time
import Queue
import os

class ProgressBar(object):
def __init__(self, minValue = 0, maxValue = 10, width = 10):
#width does NOT include the two places for [] markers
self.min = minValue
self.max = maxValue
self.span = float(self.max - self.min)
self.width = width
self.value = self.min
def updateValue(self, newValue):
#require caller to supply a value!
self.value = max(self.min, min(self.max, newValue))
def __str__(self):
#compute display fraction
percentFilled = ((self.value - self.min)
/ self.span)
widthFilled = int(self.width * percentFilled + 0.5)
return ("[" + "#"*widthFilled + " "*(self.width - widthFilled) +
"]"
+ " %5.1f%% of %6s" % (percentFilled * 100.0, self.max))


def downloadFromWeb(URL):
#this is a dummy routine which merely uses random number
#generation to determine "size" and rate of reception of
#a file transfer
size = random.randint(2**10, 2**15)
#"file size" is between 1,024 bytes and 32,767 bytes
packet = random.randint(10, 10 + (size / 64))
#"packet size is anywhere from 10 bytes to ...
latency = random.randint(1, 30) / 10.0
#"network latency" is between 0.1 and 3.0 seconds per packet
myProgress = ProgressBar(maxValue = size, width = 30)
## print "\nDownloading %s with size, packet, latency %6s, %5s, %5.2f
" % (URL, size, packet, latency)
## print "%20s: %s \r" % (URL, str(myProgress)),
screenQueue.put((URL, str(myProgress)))
for i in xrange(0, size, packet):
time.sleep(latency)
myProgress.updateValue(i)
## print "%20s: %s \r" % (URL, str(myProgress)),
screenQueue.put((URL, str(myProgress)))
myProgress.updateValue(size)
## print "%20s: %s \r" % (URL, str(myProgress)),
screenQueue.put((URL, str(myProgress) + " COMPLETED"))
## print "\n%20s: COMPLETED" % URL
completionQueue.put(URL)


def screenUpdate():
lines = {}
while True:
url, progress = screenQueue.get()
if not url: break
lines = progress os.system("cls"[email protected]) HTTP://www.bestiaria.com/
 
B

Bryan Olson

Ritesh Raj Sarraf wrote:
[...]
Here's the function, download_from_web() which calls the progress bar:
main.py
def download_from_web(sUrl, sFile, sSourceDir, checksum): [...]
temp = urllib2.urlopen(sUrl)
headers = temp.info()
size = int(headers['Content-Length'])
data = open(sFile,'wb')

Incidentally, not all HTTP responses with bodies have
a 'Content-Length' header.
 
D

Dennis Lee Bieber

Another variation. This one uses ONE progress bar to track the total
download percentage (which means each time a new download is added, the
% goes down a bit). I put a random delay in starting threads so this can
be seen.

A bit more coupling than I like between the progress bar and the
"download" threads since the download has to keep a running total of its
download -- so the proper value can be submitted for short end packets.
Instead of sending the "current value", to the bar, one has to send the
delta value (that is, how much the bar value should change from the
previous reading).

The x/y on the left tells how many have completed out of how many
are being downloaded.

-=-=-=-=-=-=-
import random
import threading
import time
import Queue

class ProgressBar(object):
def __init__(self, minValue = 0, maxValue = 0, width = 10):
#width does NOT include the two places for [] markers
self.min = minValue
self.max = maxValue
self.span = float(self.max - self.min)
self.width = width
self.value = self.min
self.items = 0 #count of items being tracked
self.complete = 0
def updateValue(self, newValue):
#require caller to supply a value! newValue is the increment
from last call
self.value = max(self.min, min(self.max, self.value + newValue))
self.display()
def completed(self):
self.complete = self.complete + 1
self.display()
def addItem(self, maxValue):
self.max = self.max + maxValue
self.span = float(self.max - self.min)
self.items = self.items + 1
self.display()
def display(self):
print "%3s/%3s items: %s\r" % (self.complete, self.items,
str(self)),
def __str__(self):
#compute display fraction
percentFilled = ((self.value - self.min)
/ self.span)
widthFilled = int(self.width * percentFilled + 0.5)
return ("[" + "#"*widthFilled + " "*(self.width - widthFilled) +
"]"
+ " %5.1f%% of %7s" % (percentFilled * 100.0, self.max))


def downloadFromWeb(URL):
#this is a dummy routine which merely uses random number
#generation to determine "size" and rate of reception of
#a file transfer
size = random.randint(2**10, 2**15)
#"file size" is between 1,024 bytes and 32,767 bytes
packet = random.randint(128, 1024)
#"packet size is anywhere from 128 bytes to 1024
latency = random.randint(1, 30) / 10.0
#"network latency" is between 0.1 and 3.0 seconds per packet
globalProgress.addItem(size)
done = 0
for i in xrange(0, size+packet, packet):
time.sleep(latency)
increment = min(packet, size - done)
done = done + increment
globalProgress.updateValue(increment)
globalProgress.completed()
completionQueue.put(URL)

#create queue for completion notification
completionQueue = Queue.Queue()

#create shared progress bar
globalProgress = ProgressBar(width = 30)

if __name__ == "__main__":
#create a random number (between 5 and 20) of download "URL"s
URLs = [ "File Number = %3s" % i
for i in xrange(random.randint(5, 20)) ]
print "\n"
for U in URLs:
#create "download" thread for each URL
threading.Thread(target=downloadFromWeb,
args=(U,)).start()
#create a delay between adding threads so count can be seen
time.sleep(random.randint(2, 8))

while URLs:
#loop while any of the "URL"s are still being processed
#note use of blocking get() call; only proceed when
#data is available
CU = completionQueue.get()
URLs.remove(CU)
-=-=-=-=-=-=-


--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
T

Thomas Guettler

Hi,

I have a small application, written in Python, that uses threads.
The application uses function foo() to download files from the web. As it reads
data from the web server, it runs a progress bar by calling an install of a
progress bar class.

When using threads, I get the problem that the progress bar gets over-written by
the download progress of files from other threads.

I believe my change has to go into the progress bar class to make it thread
aware.

You need some kind of lock. Look at the module threading. There
is a class called "Lock".

PS: If you use pygtk: I switched from using threads to idle_add and (*). This is much easier
and you don't need any locking.

# (*)
while gtk.events_pending():
gtk.main_iteration()
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top