How to test a URL request in a "while True" loop

Brian D · Dec 30, 2009

I'm actually using mechanize, but that's too complicated for testing
purposes. Instead, I've simulated in a urllib2 sample below an attempt
to test for a valid URL request.

I'm attempting to craft a loop that will trap failed attempts to
request a URL (in cases where the connection intermittently fails),
and repeat the URL request a few times, stopping after the Nth attempt
is tried.

Specifically, in the example below, a bad URL is requested for the
first and second iterations. On the third iteration, a valid URL will
be requested. The valid URL will be requested until the 5th iteration,
when a break statement is reached to stop the loop. The 5th iteration
also restores the values to their original state for ease of repeat
execution.

What I don't understand is how to test for a valid URL request, and
then jump out of the "while True" loop to proceed to another line of
code below the loop. There's probably faulty logic in this approach. I
imagine I should wrap the URL request in a function, and perhaps store
the response as a global variable.

This is really more of a basic Python logic question than it is a
urllib2 question.

Any suggestions?

Thanks,
Brian

import urllib2
user_agent = 'Windows; U; Windows NT 5.1; en-US; rv:1.9.0.10) Gecko/
2009042316 Firefox/3.0.10'
user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:
1.9.0.16) ' \
'Gecko/2009120208 Firefox/3.0.16 (.NET CLR 3.5.30729)'
headers={'User-Agent':user_agent,}
url = 'http://this is a bad url'
count = 0
while True:
count += 1
try:
print 'attempt ' + str(count)
request = urllib2.Request(url, None, headers)
response = urllib2.urlopen(request)
if response:
print 'True response.'
if count == 5:
count = 0
url = 'http://this is a bad url'
print 'How do I get out of this thing?'
break
except:
print 'fail ' + str(count)
if count == 3:
url = 'http://www.google.com'

samwyse · Dec 30, 2009

What I don't understand is how to test for a valid URL request, and
then jump out of the "while True" loop to proceed to another line of
code below the loop. There's probably faulty logic in this approach. I
imagine I should wrap the URL request in a function, and perhaps store
the response as a global variable.

This is really more of a basic Python logic question than it is a
urllib2 question.

There, I've condensed your question to what you really meant to say.
You have several approaches. First, let's define some useful objects: assert 2 < i < 5

Getting back to original question, if you want to limit the number of
attempts, don't use a while, use this:
print 'attempt', count+1
do_something(count+1)

attempt 1
Traceback (most recent call last):
File "<pyshell#55>", line 3, in <module>
do_something(count+1)
File "<pyshell#47>", line 2, in do_something
assert 2 < i < 5
AssertionError

If you want to keep exceptions from ending the loop prematurely, you
add this:
print 'attempt', count+1
try:
do_something(count+1)
except StandardError:
pass

Note that bare except clauses are *evil* and should be avoided. Most
exceptions derive from StandardError, so trap that if you want to
catch errors. Finally, to stop iterating when the errors cease, do
this:
for count in xrange(max_attempts):
print 'attempt', count+1
try:
do_something(count+1)
raise StopIteration
except StandardError:
pass
except StopIteration:
pass

attempt 1
attempt 2
attempt 3

Note that StopIteration doesn't derive from StandardError, because
it's not an error, it's a notification. So, throw it if and when you
want to stop iterating.

BTW, note that you don't have to wrap your code in a function.
do_something could be replaced with it's body and everything would
still work.

Brian D · Dec 30, 2009

There, I've condensed your question to what you really meant to say.
You have several approaches. First, let's define some useful objects:>>> max_attempts = 5

assert 2 < i < 5

Getting back to original question, if you want to limit the number of
attempts, don't use a while, use this:

print 'attempt', count+1
do_something(count+1)

attempt 1
Traceback (most recent call last):
File "<pyshell#55>", line 3, in <module>
do_something(count+1)
File "<pyshell#47>", line 2, in do_something
assert 2 < i < 5
AssertionError

If you want to keep exceptions from ending the loop prematurely, you
add this:

print 'attempt', count+1
try:
do_something(count+1)
except StandardError:
pass

Note that bare except clauses are *evil* and should be avoided. Most
exceptions derive from StandardError, so trap that if you want to
catch errors. Finally, to stop iterating when the errors cease, do
this:

for count in xrange(max_attempts):
print 'attempt', count+1
try:
do_something(count+1)
raise StopIteration
except StandardError:
pass
except StopIteration:
pass

attempt 1
attempt 2
attempt 3

Note that StopIteration doesn't derive from StandardError, because
it's not an error, it's a notification. So, throw it if and when you
want to stop iterating.

BTW, note that you don't have to wrap your code in a function.
do_something could be replaced with it's body and everything would
still work.

I'm totally impressed. I love elegant code. Could you tell I was
trained as a VB programmer? I think I can still be reformed.

I appreciate the admonition not to use bare except clauses. I will
avoid that in the future.

I've never seen StopIteration used -- and certainly not used in
combination with a try/except pair. That was an exceptionally valuable
lesson.

I think I can take it from here, so I'll just say thank you, Sam, for
steering me straight -- very nice.

Philip Semanchuk · Dec 30, 2009

I'm actually using mechanize, but that's too complicated for testing
purposes. Instead, I've simulated in a urllib2 sample below an attempt
to test for a valid URL request.

I'm attempting to craft a loop that will trap failed attempts to
request a URL (in cases where the connection intermittently fails),
and repeat the URL request a few times, stopping after the Nth attempt
is tried.

Specifically, in the example below, a bad URL is requested for the
first and second iterations. On the third iteration, a valid URL will
be requested. The valid URL will be requested until the 5th iteration,
when a break statement is reached to stop the loop. The 5th iteration
also restores the values to their original state for ease of repeat
execution.

What I don't understand is how to test for a valid URL request, and
then jump out of the "while True" loop to proceed to another line of
code below the loop. There's probably faulty logic in this approach. I
imagine I should wrap the URL request in a function, and perhaps store
the response as a global variable.

This is really more of a basic Python logic question than it is a
urllib2 question.

Hi Brian,
While I don't fully understand what you're trying to accomplish by
changing the URL to google.com after 3 iterations, I suspect that some
of your trouble comes from using "while True". Your code would be
clearer if the while clause actually stated the exit condition. Here's
a suggestion (untested):

MAX_ATTEMPTS = 5

count = 0
while count <= MAX_ATTEMPTS:
count += 1
try:
print 'attempt ' + str(count)
request = urllib2.Request(url, None, headers)
response = urllib2.urlopen(request)
if response:
print 'True response.'
except URLError:
print 'fail ' + str(count)

You could also save the results (untested):

MAX_ATTEMPTS = 5

count = 0
results = [ ]
while count <= MAX_ATTEMPTS:
count += 1
try:
print 'attempt ' + str(count)
request = urllib2.Request(url, None, headers)
f = urllib2.urlopen(request)
# Note that here I ignore the doc that says "None may be
# returned if no handler handles the request". Caveat emptor.
results.append(f.info())
f.close()
except URLError:
# Even better, append actual reasons for the failure.
results.append(False)

for result in results:
print result

I guess if you're going to do the same number of attempts each time, a
for loop would be more expressive, but you probably get the idea.

Hope this helps
Philip

MRAB · Dec 30, 2009

Brian said:
I'm totally impressed. I love elegant code. Could you tell I was
trained as a VB programmer? I think I can still be reformed.

I appreciate the admonition not to use bare except clauses. I will
avoid that in the future.

I've never seen StopIteration used -- and certainly not used in
combination with a try/except pair. That was an exceptionally valuable
lesson.

I think I can take it from here, so I'll just say thank you, Sam, for
steering me straight -- very nice.

Instead of raising StopIteration you could use 'break':

for count in xrange(max_attempts):
print 'attempt', count + 1
try:
do_something(count + 1)
break
except StandardError:
pass

The advantage, apart from the length, is that you can then add the
'else' clause to the 'for' loop, which will be run if it _didn't_ break
out of the loop. If you break out only after do_something() is
successful, then not breaking out means that do_something() never
succeeded:

for count in xrange(max_attempts):
print 'attempt', count + 1
try:
do_something(count + 1)
break
except StandardError:
pass
else:
print 'all attempts failed'

Brian D · Dec 30, 2009

I'm actually using mechanize, but that's too complicated for testing
purposes. Instead, I've simulated in a urllib2 sample below an attempt
to test for a valid URL request.

Click to expand...

I'm attempting to craft a loop that will trap failed attempts to
request a URL (in cases where the connection intermittently fails),
and repeat the URL request a few times, stopping after the Nth attempt
is tried.

Click to expand...

Specifically, in the example below, a bad URL is requested for the
first and second iterations. On the third iteration, a valid URL will
be requested. The valid URL will be requested until the 5th iteration,
when a break statement is reached to stop the loop. The 5th iteration
also restores the values to their original state for ease of repeat
execution.

Click to expand...

What I don't understand is how to test for a valid URL request, and
then jump out of the "while True" loop to proceed to another line of
code below the loop. There's probably faulty logic in this approach. I
imagine I should wrap the URL request in a function, and perhaps store
the response as a global variable.

Click to expand...

This is really more of a basic Python logic question than it is a
urllib2 question.

Click to expand...

Hi Brian,
While I don't fully understand what you're trying to accomplish by
changing the URL to google.com after 3 iterations, I suspect that some
of your trouble comes from using "while True". Your code would be
clearer if the while clause actually stated the exit condition. Here's
a suggestion (untested):

MAX_ATTEMPTS = 5

count = 0
while count <= MAX_ATTEMPTS:
count += 1
try:
print 'attempt ' + str(count)
request = urllib2.Request(url, None, headers)
response = urllib2.urlopen(request)
if response:
print 'True response.'
except URLError:
print 'fail ' + str(count)

You could also save the results (untested):

MAX_ATTEMPTS = 5

count = 0
results = [ ]
while count <= MAX_ATTEMPTS:
count += 1
try:
print 'attempt ' + str(count)
request = urllib2.Request(url, None, headers)
f = urllib2.urlopen(request)
# Note that here I ignore the doc that says "None may be
# returned if no handler handles the request". Caveat emptor.
results.append(f.info())
f.close()
except URLError:
# Even better, append actual reasons for the failure.
results.append(False)

for result in results:
print result

I guess if you're going to do the same number of attempts each time, a
for loop would be more expressive, but you probably get the idea.

Hope this helps
Philip

Nice to have options, Philip. Thanks! I'll give your solution a try in
mechanize as well. I really can't thank you enough for contributing to
helping me solve this issue. I love Python.

Brian D · Dec 31, 2009

Thanks MRAB as well. I've printed all of the replies to retain with my
pile of essential documentation.

To follow up with a complete response, I'm ripping out of my mechanize
module the essential components of the solution I got to work.

The main body of the code passes a URL to the scrape_records function.
The function attempts to open the URL five times.

If the URL is opened, a values dictionary is populated and returned to
the calling statement. If the URL cannot be opened, a fatal error is
printed and the module terminates. There's a little sleep call in the
function to leave time for any errant connection problem to resolve
itself.

Thanks to all for your replies. I hope this helps someone else:

import urllib2, time
from mechanize import Browser

def scrape_records(url):
maxattempts = 5
br = Browser()
user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:
1.9.0.16) Gecko/2009120208 Firefox/3.0.16 (.NET CLR 3.5.30729)'
br.addheaders = [('User-agent', user_agent)]
for count in xrange(maxattempts):
try:
print url, count
br.open(url)
break
except urllib2.URLError:
print 'URL error', count
# Pretend a failed connection was fixed
if count == 2:
url = 'http://www.google.com'
time.sleep(1)
pass
else:
print 'Fatal URL error. Process terminated.'
return None
# Scrape page and populate valuesDict
valuesDict = {}
return valuesDict

url = 'http://badurl'
valuesDict = scrape_records(url)
if valuesDict == None:
print 'Failed to retrieve valuesDict'

MRAB · Dec 31, 2009

Brian said:
Thanks MRAB as well. I've printed all of the replies to retain with my
pile of essential documentation.

To follow up with a complete response, I'm ripping out of my mechanize
module the essential components of the solution I got to work.

The main body of the code passes a URL to the scrape_records function.
The function attempts to open the URL five times.

If the URL is opened, a values dictionary is populated and returned to
the calling statement. If the URL cannot be opened, a fatal error is
printed and the module terminates. There's a little sleep call in the
function to leave time for any errant connection problem to resolve
itself.

Thanks to all for your replies. I hope this helps someone else:

import urllib2, time
from mechanize import Browser

def scrape_records(url):
maxattempts = 5
br = Browser()
user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:
1.9.0.16) Gecko/2009120208 Firefox/3.0.16 (.NET CLR 3.5.30729)'
br.addheaders = [('User-agent', user_agent)]
for count in xrange(maxattempts):
try:
print url, count
br.open(url)
break
except urllib2.URLError:
print 'URL error', count
# Pretend a failed connection was fixed
if count == 2:
url = 'http://www.google.com'
time.sleep(1)
pass

'pass' isn't necessary.

else:
print 'Fatal URL error. Process terminated.'
return None
# Scrape page and populate valuesDict
valuesDict = {}
return valuesDict

url = 'http://badurl'
valuesDict = scrape_records(url)
if valuesDict == None:

When checking whether or not something is a singleton, such as None, use
"is" or "is not" instead of "==" or "!=".

Brian D · Dec 31, 2009

Brian said:
Brian said:

Thanks MRAB as well. I've printed all of the replies to retain with my
pile of essential documentation.

Click to expand...

To follow up with a complete response, I'm ripping out of my mechanize
module the essential components of the solution I got to work.

Click to expand...

The main body of the code passes a URL to the scrape_records function.
The function attempts to open the URL five times.

Click to expand...

If the URL is opened, a values dictionary is populated and returned to
the calling statement. If the URL cannot be opened, a fatal error is
printed and the module terminates. There's a little sleep call in the
function to leave time for any errant connection problem to resolve
itself.

Click to expand...

Thanks to all for your replies. I hope this helps someone else:

Click to expand...

import urllib2, time
from mechanize import Browser

Click to expand...

def scrape_records(url):
maxattempts = 5
br = Browser()
user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:
1.9.0.16) Gecko/2009120208 Firefox/3.0.16 (.NET CLR 3.5.30729)'
br.addheaders = [('User-agent', user_agent)]
for count in xrange(maxattempts):
try:
print url, count
br.open(url)
break
except urllib2.URLError:
print 'URL error', count
# Pretend a failed connection was fixed
if count == 2:
url = 'http://www.google.com'
time.sleep(1)
pass

Click to expand...

'pass' isn't necessary.

else:
print 'Fatal URL error. Process terminated.'
return None
# Scrape page and populate valuesDict
valuesDict = {}
return valuesDict

Click to expand...

url = 'http://badurl'
valuesDict = scrape_records(url)
if valuesDict == None:

Click to expand...

When checking whether or not something is a singleton, such as None, use
"is" or "is not" instead of "==" or "!=".

print 'Failed to retrieve valuesDict'

Click to expand...

I'm definitely acquiring some well-deserved schooling -- and it's
really appreciated. I'd seen the "is/is not" preference before, but it
just didn't stick.

I see now that "pass" is redundant -- thanks for catching that.

Cheers.

Steve Holden · Dec 31, 2009

Brian D wrote:
[...]

I'm definitely acquiring some well-deserved schooling -- and it's
really appreciated. I'd seen the "is/is not" preference before, but it
just didn't stick.

Yes, a lot of people have acquired the majority of their Python
education from this list - I have certainly learned a thing or two from
it over the years, and had some very interesting discussions.

is/is not are about object identity. Saying

a is b

is pretty much the same thing as saying

id(a) == id(b)

so it's a test that two expressions are references to the exact same
object. So it works with None, since there is only ever one value of
<type 'NoneType'>.

Be careful not to use it when there can be several different but equal
values, though.

I see now that "pass" is redundant -- thanks for catching that.

regards
Steve

Aahz · Jan 14, 2010

While I don't fully understand what you're trying to accomplish by
changing the URL to google.com after 3 iterations, I suspect that some
of your trouble comes from using "while True". Your code would be
clearer if the while clause actually stated the exit condition. Here's
a suggestion (untested):

MAX_ATTEMPTS = 5

count = 0
while count <= MAX_ATTEMPTS:
count += 1
try:
print 'attempt ' + str(count)
request = urllib2.Request(url, None, headers)
response = urllib2.urlopen(request)
if response:
print 'True response.'
except URLError:
print 'fail ' + str(count)

Note that you may have good reason for doing it differently:

MAX_ATTEMPTS = 5
def retry(url):
count = 0
while True:
count += 1
try:
print 'attempt', count
request = urllib2.Request(url, None, headers)
response = urllib2.urlopen(request)
if response:
print 'True response'
except URLError:
if count < MAX_ATTEMPTS:
time.sleep(5)
else:
raise

This structure is required in order for the raise to do a proper
re-raise.

BTW, your code is rather oddly indented, please stick with PEP8.

Aahz · Jan 14, 2010

Note that you may have good reason for doing it differently:

MAX_ATTEMPTS = 5
def retry(url):
count = 0
while True:
count += 1
try:
print 'attempt', count
request = urllib2.Request(url, None, headers)
response = urllib2.urlopen(request)
if response:
print 'True response'

^^^^^
Oops, that print should have been a return.

Spring Boot Request Mapping: How to Handle Multiple Request Paths in a Controller	1	Oct 12, 2023
Passing flask textbox value to an infinite while loop	0	Jul 21, 2021
How to implement a html parser in java?	1	Dec 28, 2023
How can I execute a function ONLY if fetch request returns 404 status?	0	Sep 17, 2022
urllib2 opendirector versus request object	0	Jun 9, 2011
How to loop in folder through all excel files and all sheets using pandas?	0	Dec 1, 2022
How can I upload a tar.bz2 file to OpenStack swift object storage container using the Python swift client?	2	Mar 22, 2024
How to try a range of hex values in C# code ?	0	Nov 19, 2022

How to test a URL request in a "while True" loop

Brian D

samwyse

Brian D

Philip Semanchuk

MRAB

Brian D

Brian D

MRAB

Brian D

Steve Holden

Aahz

Aahz

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads