web page retrieve problems

G

golu

the following function retrieves pages from the web and saves them in
a specified dir. i want to extract the respective filenames from the
urls e.g the page code.google.com shud be saved as code-google.htm or
something similar. can u suggest me a way to do it
def retrieve_url(self,url):
""" The main method of the robot class and is called
run method to retrieve the given urls from the web."""

if url is not None:

try:
if visited.has_key(url): return
pieces=urlparse.urlparse(url)
filepath=pieces[2]
if filepath != '':
filepath=filepath[1:]
filename=filepath.split("/")[-1]
else:
filename='home.htm'


path=os.path.join(PAGE_DIR,filename)
url=urlparse.urlunparse(pieces)
p=url.rfind('#') #temporary
if p!=-1:
url=url[:p]

visited=1
m=urllib2.urlopen(url)

fopen=open(path,'wb')

fopen.seek(0)
fopen.write(url+'|')

fopen.write(m.read())
fopen.close()
print url ,'retrieved'

except IOError:
print url
print "ERROR:OOPS! THE URL CAN'T BE RETRIEVED"

return
 
A

Alex

the following function retrieves pages from the web and saves them in
a specified dir. i want to extract the respective filenames from the
urls e.g the page code.google.com shud be saved as code-google.htm  or
something similar. can u suggest me a way to do it

Try with urllib.urlretrieve from standard lib:

urllib.urlretrieve(url[, filename[, reporthook[, data]]])¶
Copy a network object denoted by a URL to a local file, if necessary.
If the URL points to a local file, or a valid cached copy of the
object exists, the object is not copied. Return a tuple (filename,
headers) where filename is the local file name under which the object
can be found, and headers is whatever the info() method of the object
returned by urlopen() returned (for a remote object, possibly cached).
Exceptions are the same as for urlopen().
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,201
Messages
2,571,049
Members
47,652
Latest member
Campbellamy

Latest Threads

Top