Download unnamed web image?

G

galileo228

All,

My python program signs onto the student facebook at my school and,
given email addresses, returns the associated full name. If I were to
do this through a regular browser, there is also a picture of the
individual, and I am trying to get my program to download the picture
as well. The problem: the html code of the page does not point to a
particular file, but rather refers to (what seems like) a query.

So, if one went to the facebook and searched for me using my school
net id (msb83), the image of my profile on the results page is:

<img width="100" height="130" border="0" class="border" alt="msb83"
src="deliverImage.cfm?netid=MSB83">

Using BeautifulSoup, mechanize, and urllib, I've constructed the
following:

br.open("http://www.school.edu/students/facebook/")
br.select_form(nr = 1)

br.form['fulltextsearch'] = 'msb83' # this searches the facebook for
me
br.submit()
results = br.response().read()
soup = BeautifulSoup(results)
foo2 = soup.find('td', attrs={'width':'95'})
foo3 = foo2.find('a')
foo4 = foo3.find('img', attrs={'src':'deliverImage.cfm?netid=msb83'})
# this just drills down to the <img> line and until this point the
program does not return an error

save_as = os.path.join('./', msb83 + '.jpg')
urllib.urlretrieve(foo4, save_as)

I get the following error msg after running this code:

AttributeError: 'NoneType' object has no attribute 'strip'

I can download the picture through my browser by right-clicking,
selecting save as, and then the image gets saved as
'deliverImage.cfm.jpeg.'

Are there any suggestions as to how I might be able to download the
image using python?

Please let me know if more information is needed -- happy to supply
it.

Matt
 
J

John Bokma

galileo228 said:
Using BeautifulSoup, mechanize, and urllib, I've constructed the
following:

br.open("http://www.school.edu/students/facebook/")
br.select_form(nr = 1)

br.form['fulltextsearch'] = 'msb83' # this searches the facebook for
me
br.submit()
results = br.response().read()
soup = BeautifulSoup(results)
foo2 = soup.find('td', attrs={'width':'95'})
foo3 = foo2.find('a')
foo4 = foo3.find('img', attrs={'src':'deliverImage.cfm?netid=msb83'})
# this just drills down to the <img> line and until this point the
program does not return an error

save_as = os.path.join('./', msb83 + '.jpg')
urllib.urlretrieve(foo4, save_as)

I get the following error msg after running this code:

AttributeError: 'NoneType' object has no attribute 'strip'

Wild guess, since you didn't provide line numbers, etc.

foo4 is None

(I also would like to suggest to use more meaningful names)
 
G

galileo228

galileo228 said:
Using BeautifulSoup, mechanize, and urllib, I've constructed the
following:
br.form['fulltextsearch'] = 'msb83' # this searches the facebook for
me
br.submit()
results = br.response().read()
soup = BeautifulSoup(results)
foo2 = soup.find('td', attrs={'width':'95'})
foo3 = foo2.find('a')
foo4 = foo3.find('img', attrs={'src':'deliverImage.cfm?netid=msb83'})
# this just drills down to the <img> line and   until this point the
program does not return an error
save_as = os.path.join('./', msb83 + '.jpg')
urllib.urlretrieve(foo4, save_as)>
I get the following error msg after running this code:
AttributeError: 'NoneType' object has no attribute 'strip'

Wild guess, since you didn't provide line numbers, etc.

foo4 is None

(I also would like to suggest to use more meaningful names)


I thought it was too, and I just doublechecked. It's actually

foo3 = foo2.find('a')

that is causing the NoneType error.

Thoughts?
 
G

galileo228

galileo228 said:
Using BeautifulSoup, mechanize, and urllib, I've constructed the
following:
br.open("http://www.school.edu/students/facebook/")
br.select_form(nr = 1)
br.form['fulltextsearch'] = 'msb83' # this searches the facebook for
me
br.submit()
results = br.response().read()
soup = BeautifulSoup(results)
foo2 = soup.find('td', attrs={'width':'95'})
foo3 = foo2.find('a')
foo4 = foo3.find('img', attrs={'src':'deliverImage.cfm?netid=msb83'})
# this just drills down to the <img> line and   until this point the
program does not return an error
save_as = os.path.join('./', msb83 + '.jpg')
urllib.urlretrieve(foo4, save_as)>
I get the following error msg after running this code:
AttributeError: 'NoneType' object has no attribute 'strip'
Wild guess, since you didn't provide line numbers, etc.
foo4 is None
(I also would like to suggest to use more meaningful names)

I thought it was too, and I just doublechecked.  It's actually

foo3 = foo2.find('a')

that is causing the NoneType error.

Thoughts?

I've now fixed the foo3 issue, and I now know that the problem is with
the urllib.urlretrieve line (see above). This is the error msg I get
in IDLE:

Traceback (most recent call last):
File "/Users/Matt/Documents/python/dtest.py", line 59, in <module>
urllib.urlretrieve(foo4, save_as)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/urllib.py", line 94, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/urllib.py", line 226, in retrieve
url = unwrap(toBytes(url))
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/urllib.py", line 1033, in unwrap
url = url.strip()
TypeError: 'NoneType' object is not callable

Is this msg being generated because I'm trying to retrieve a url
that's not really a file?
 
J

John Bokma

galileo228 said:
[...]

I've now fixed the foo3 issue, and I now know that the problem is with
the urllib.urlretrieve line (see above). This is the error msg I get
in IDLE:

Traceback (most recent call last):
File "/Users/Matt/Documents/python/dtest.py", line 59, in <module>
urllib.urlretrieve(foo4, save_as)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/urllib.py", line 94, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/urllib.py", line 226, in retrieve
url = unwrap(toBytes(url))
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/urllib.py", line 1033, in unwrap
url = url.strip()
TypeError: 'NoneType' object is not callable

Is this msg being generated because I'm trying to retrieve a url
that's not really a file?
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/urllib.py", line 89, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "/usr/lib/python2.5/urllib.py", line 210, in retrieve
url = unwrap(toBytes(url))
File "/usr/lib/python2.5/urllib.py", line 1009, in unwrap
url = url.strip()
AttributeError: 'NoneType' object has no attribute 'strip'
--8<---------------cut here---------------end--------------->8---

To me it looks like you're still calling urlretrieve with None as a
first value.
 
M

Matthew Barnett

galileo228 said:
Using BeautifulSoup, mechanize, and urllib, I've constructed the
following:
br.open("http://www.school.edu/students/facebook/")
br.select_form(nr = 1)
br.form['fulltextsearch'] = 'msb83' # this searches the facebook for
me
br.submit()
results = br.response().read()
soup = BeautifulSoup(results)
foo2 = soup.find('td', attrs={'width':'95'})
foo3 = foo2.find('a')
foo4 = foo3.find('img', attrs={'src':'deliverImage.cfm?netid=msb83'})
# this just drills down to the <img> line and until this point the
program does not return an error
save_as = os.path.join('./', msb83 + '.jpg')
urllib.urlretrieve(foo4, save_as)>
I get the following error msg after running this code:
AttributeError: 'NoneType' object has no attribute 'strip'
Wild guess, since you didn't provide line numbers, etc.
foo4 is None
(I also would like to suggest to use more meaningful names)
I thought it was too, and I just doublechecked. It's actually

foo3 = foo2.find('a')

that is causing the NoneType error.

Thoughts?

I've now fixed the foo3 issue, and I now know that the problem is with
the urllib.urlretrieve line (see above). This is the error msg I get
in IDLE:

Traceback (most recent call last):
File "/Users/Matt/Documents/python/dtest.py", line 59, in <module>
urllib.urlretrieve(foo4, save_as)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/urllib.py", line 94, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/urllib.py", line 226, in retrieve
url = unwrap(toBytes(url))
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/urllib.py", line 1033, in unwrap
url = url.strip()
TypeError: 'NoneType' object is not callable

Is this msg being generated because I'm trying to retrieve a url
that's not really a file?

It's because the URL you're passing in, namely foo4, is None. This is
presumably because foo3.find() returns None if it can't find the entry.

You checked the value of foo3, but did you check the value of foo4?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,816
Latest member
SapanaCarpetStudio

Latest Threads

Top