Image scraping from behind a proxy

A

Abhishek Ghose

Hi,

I was looking at this post in the forum for downloading image files from
the www:
http://www.ruby-forum.com/topic/133833

But it doesnt work for me, apparently because I am behind a proxy. For
the above code(s) I get errors like the following:

c:/ruby/lib/ruby/1.8/net/http.rb:564:in `initialize': No connection
could be mad
e because the target machine actively refused it. - connect(2)
(Errno::ECONNREFU
SED)
from c:/ruby/lib/ruby/1.8/net/http.rb:564:in `open'
from c:/ruby/lib/ruby/1.8/net/http.rb:564:in `connect'
from c:/ruby/lib/ruby/1.8/timeout.rb:48:in `timeout'
from c:/ruby/lib/ruby/1.8/timeout.rb:76:in `timeout'
from c:/ruby/lib/ruby/1.8/net/http.rb:564:in `connect'
from c:/ruby/lib/ruby/1.8/net/http.rb:557:in `do_start'
from c:/ruby/lib/ruby/1.8/net/http.rb:546:in `start'
from c:/ruby/lib/ruby/1.8/open-uri.rb:243:in `open_http'
... 7 levels...
from test.rb:48:in `write_images'
from test.rb:45:in `each'
from test.rb:45:in `write_images'
from test.rb:76



I had run into similar problems when I had tried to obtain a http
response. Back then I started doing this (which works perfectly for me):

$proxy_addr = 'proxyservername'
$proxy_port = 8080
$proxy=Net::HTTP::proxy($proxy_addr, $proxy_port)

http_query="http://www.yahoo.com"
url = URI.parse(http_query)
http_response = $proxy.get_response(url)



Is there something similar I can do for obtaining image files? I did
tweak the above code to have a http image file location in the
http_query and store the http_response.body into a normal file. Though
that didnt give me any errors, my jpeg is unreadable. :(
 
A

Abhishek Ghose

While I was writing my query I figured out what I am supposed to do :)
Sorry for the thread. I hope it helps other visitors to the forum.

Here's how it works now:

$proxy_addr = 'proxyservername'
$proxy_port = 8080


Net::HTTP::proxy($proxy_addr, $proxy_port).start("static.flickr.com") {
|http|
resp = http.get("/92/218926700_ecedc5fef7_o.jpg")
open("fun.jpg", "wb") { |file|
file.write(resp.body)
}
}



The above is tweaked version of the example available here:
http://www.rubynoob.com/articles/2006/8/21/how-to-download-files-with-a-ruby-script

It just uses Net::HTTP::proxy instead of Net::HTTP
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,962
Messages
2,570,134
Members
46,692
Latest member
JenniferTi

Latest Threads

Top