J
Jacek Trzmiel
Hi,
I have a problem with using urllib2 with threading module under Cygwin.
$ cygcheck -cd cygwin python
Cygwin Package Information
Package Version
cygwin 1.5.5-1
python 2.3.2-1
Here is minimal app where I can reproduce errors:
--- MtUrllib2Test.py -------------------------------------------------
#!/usr/bin/env python
import urllib2
import threading
import sys
import time
def FetchPage():
# time.sleep(3)
# return
opener = urllib2.build_opener()
urlFile = opener.open( 'http://google.com/' )
pageData = urlFile.read()
def IncCounterAndPrint( count=[0] ):
count[0] += 1
print count[0]
# sys.stdout.flush()
def Main():
noOfThreads = 1
for unused in range(noOfThreads):
thread = threading.Thread( target=FetchPage, args=() )
thread.start()
# time.sleep(0.2)
IncCounterAndPrint()
while(threading.activeCount()>1):
IncCounterAndPrint()
time.sleep(0.5)
IncCounterAndPrint()
if __name__ == "__main__":
Main()
--- MtUrllib2Test.py -------------------------------------------------
0. Simple case. Here everything looks ok:
$ python MtUrllib2Test.py
1
2
3
4
5
1. First error.
$ python MtUrllib2Test.py | tee out.txt
3
4
5
Leading prints has been eaten somewhere. Uncommenting disabled code in
ANY of the functions does make output correct, but none of the solutions
looks good for me:
a) IncCounterAndPrint() - sys.stdout.flush()
As I understand if stdout is not console then output gets buffered (i.e.
it's not flushed automatically). Adding a flush call does make output
good, but this looks like a kludge for me, not a real fix. I am writing
to stdout from only one thread, so everything should be fine without
calling flush, shouldn't it?
b) Main() - time.sleep(0.2)
Adding a little sleep after starting thread does make output correct
too. For me it looks like race condition either in urllib2 or in
cygwin. Or am I completely off here?
c) FetchPage() - time.sleep(3), return
Disabling calls to urllib2 does make problem go away, too.
2. Second error.
If I increase number of threads:
noOfThreads = 20
and run this prog (you may need to run it several times, or rise number
of threads more to reproduce), then sometimes it does fail this way :
$ python MtUrllib2Test.py | tee out.txt
4 [win] python 1744 Winmain: Cannot register window class
C:\cygwin\bin\python2.3.exe: *** WFSO failed, Win32 error 6
or hangs this way:
$ python MtUrllib2Test.py | tee out.txt
243 [win] python 1696 Winmain: Cannot register window class
520 [win] python 1696 Winmain: Cannot register window class
Can anyone help me on those two?
Best regards,
Jacek.
I have a problem with using urllib2 with threading module under Cygwin.
$ cygcheck -cd cygwin python
Cygwin Package Information
Package Version
cygwin 1.5.5-1
python 2.3.2-1
Here is minimal app where I can reproduce errors:
--- MtUrllib2Test.py -------------------------------------------------
#!/usr/bin/env python
import urllib2
import threading
import sys
import time
def FetchPage():
# time.sleep(3)
# return
opener = urllib2.build_opener()
urlFile = opener.open( 'http://google.com/' )
pageData = urlFile.read()
def IncCounterAndPrint( count=[0] ):
count[0] += 1
print count[0]
# sys.stdout.flush()
def Main():
noOfThreads = 1
for unused in range(noOfThreads):
thread = threading.Thread( target=FetchPage, args=() )
thread.start()
# time.sleep(0.2)
IncCounterAndPrint()
while(threading.activeCount()>1):
IncCounterAndPrint()
time.sleep(0.5)
IncCounterAndPrint()
if __name__ == "__main__":
Main()
--- MtUrllib2Test.py -------------------------------------------------
0. Simple case. Here everything looks ok:
$ python MtUrllib2Test.py
1
2
3
4
5
1. First error.
$ python MtUrllib2Test.py | tee out.txt
3
4
5
Leading prints has been eaten somewhere. Uncommenting disabled code in
ANY of the functions does make output correct, but none of the solutions
looks good for me:
a) IncCounterAndPrint() - sys.stdout.flush()
As I understand if stdout is not console then output gets buffered (i.e.
it's not flushed automatically). Adding a flush call does make output
good, but this looks like a kludge for me, not a real fix. I am writing
to stdout from only one thread, so everything should be fine without
calling flush, shouldn't it?
b) Main() - time.sleep(0.2)
Adding a little sleep after starting thread does make output correct
too. For me it looks like race condition either in urllib2 or in
cygwin. Or am I completely off here?
c) FetchPage() - time.sleep(3), return
Disabling calls to urllib2 does make problem go away, too.
2. Second error.
If I increase number of threads:
noOfThreads = 20
and run this prog (you may need to run it several times, or rise number
of threads more to reproduce), then sometimes it does fail this way :
$ python MtUrllib2Test.py | tee out.txt
4 [win] python 1744 Winmain: Cannot register window class
C:\cygwin\bin\python2.3.exe: *** WFSO failed, Win32 error 6
or hangs this way:
$ python MtUrllib2Test.py | tee out.txt
243 [win] python 1696 Winmain: Cannot register window class
520 [win] python 1696 Winmain: Cannot register window class
Can anyone help me on those two?
Best regards,
Jacek.