urllib download insanity

T

Timothy Smith

ok what i am seeing is impossible.
i DELETED the file from my webserver, uploaded the new one. when my app
logs in it checks the file, if it's changed it downloads it. the
impossible part, is that on my pc is downloading the OLD file i've
deleted! if i download it via IE, i get the new file. SO, my only
conculsion is that urllib is caching it some where. BUT i'm already
calling urlcleanup(), so what else can i do?
here is the code

LastModified = urllib2.urlopen('http://x.x.x.x/library.zip')
LastModified = LastModified.headers['Content-Length']

LocalFile = os.stat('library.zip')
LocalFile = int(LocalFile.st_size)


if LocalFile != int(LastModified):
urllib.urlcleanup()

urllib.urlretrieve('http://x.x.x.x/library.zip','library.zip')

as a test i got someone in the office to login and try it - worked
properly for them. i'm on a different ISP to them however, so my other
idea is that possibly my isp has a transparent proxy setup that urllib
is using, but IE isn't???
 
A

Andrew Dalke

Timothy said:
ok what i am seeing is impossible.
i DELETED the file from my webserver, uploaded the new one. when my app
logs in it checks the file, if it's changed it downloads it. the
impossible part, is that on my pc is downloading the OLD file i've
deleted! if i download it via IE, i get the new file. SO, my only
conculsion is that urllib is caching it some where. BUT i'm already
calling urlcleanup(), so what else can i do?

Here are some ideas to use in your hunt.

- If you are getting a cached local file then the returned object
will have a "name" attribute.

result = urllib.retrieve(".....")
print result.fp.name

As far as I can tell, this will only occur if you use
a tempcache or a file URL.


- You can force some debugging of the open calls, to see if
your program is dealing with a local file.
.... print "opening", args
.... return old_open(*args)
.... opening ('/etc/passwd',)

You'll may also need to change os.fdopen because that's used
by retrieve if it needs a tempfile.

If you want to see where the open is being called from,
use one of the functions in the traceback module to print
the stack trace.

- for surety's sake, also do

import webbrowser
webbrowser.open(url)

just before you do

urllib.retrieve(url, filename)

This will double check that your program is using the URL you
expect it to use.

- beyond that, check that you've got network activity,

You could check the router lights, or use a web sniffer like
ethereal, or set up a debugging proxy

- check the headers. If your ISP is using a cache then
it might insert a header into what it returns. But if
it was caching then your IE view should have seen the cached
version as well.

Andrew
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,189
Members
46,735
Latest member
HikmatRamazanov

Latest Threads

Top