Fetching an URL using cookies

Z

Zouplaz

I'm trying to fetch an url which needs several cookies to be set in
order to properly return a result.

I've found a page in the website from which I can get the session
cookies (instead of posting cookies set by myself I prefer use the ones
coming from the server)

So,

def http_get(url, url_before = nil)
headers = Hash.new()
headers['User-agent'] = "Mozilla/4.0 (compatible; MSIE 6.0; Windows
NT 5.1)"
unless url_before.nil?
response = @http.get(url_before)
cookies = response.response['set-cookie']
headers['Cookie'] = cookies
end
response = @http.get(url, headers)
raise "url #{url} not accessible on host #{@host}:#{@port} - code
#{response.code}" if not ['200','302'].include?(response.code)
response.body
end


The problem is that I'm not sure it the way I repost the cookies is
right or not. The cookies retrieved by the unless block ARE OK but when
the second @http.get occurs, the remote web server ignore them and send
a redirect to a default page.

So, I need to be sure that I send cookies properly in the GET request
before investigating the cookies's content


Thanks for your help



Note : when rewiewing this post I think I should write some code to keep
the cookies's content between two calls (as a browser do) instead of
handling things the way I do. But it's juste a side note
 
W

William Crawford

Zouplaz said:
I'm trying to fetch an url which needs several cookies to be set in
order to properly return a result.
The problem is that I'm not sure it the way I repost the cookies is
right or not. The cookies retrieved by the unless block ARE OK but when
the second @http.get occurs, the remote web server ignore them and send
a redirect to a default page.

Why not -try- to manufacture them yourself and see if it works? If it
does, you know how to send them and can just make sure the
newly-obtained cookies are sent the same way. If it doesn't, massage it
until it does work.
So, I need to be sure that I send cookies properly in the GET request
before investigating the cookies's content
Right.

Note : when rewiewing this post I think I should write some code to keep
the cookies's content between two calls (as a browser do) instead of
handling things the way I do. But it's juste a side note

Yes, good idea.

Another thought, however. Perhaps the page has additional requirements
that you haven't met. Cookies that don't exist on the other page, but
were set at login or somewhere else. Headers that you aren't sending
and it expects. A specific refering page. (Or something else I've
momentarily forgotten.)
 
Z

Zouplaz

le 12/09/2006 13:09, William Crawford nous a dit:
Another thought, however. Perhaps the page has additional requirements
that you haven't met. Cookies that don't exist on the other page, but
were set at login or somewhere else. Headers that you aren't sending
and it expects. A specific refering page. (Or something else I've
momentarily forgotten.)

I don't why but suddenly it worked... I presumed I've missed something
somewhere..

Now, I've rewritten the code and I use a "write-once" cookie mechanism
which is generic for every "scrapping" class that I use - It's
sufficient for now

def http_get(url)
headers = Hash.new()
headers['User-agent'] = "Mozilla/4.0 (compatible; MSIE 6.0; Windows
NT 5.1)"
headers['Cookie'] = @cookies unless @cookies.nil?
response = @http.get(url, headers)
raise "url #{url} no access on host #{@host}:#{@port} - code
#{response.code}" if not ['200','302'].include?(response.code)
@cookies = response.response['set-cookie'] if @cookies.nil?
response.body
end

Just for my own education, could this code be rewriten in a more elegant
way ?


Thanks
 
W

William Crawford

Zouplaz said:
le 12/09/2006 13:09, William Crawford nous a dit:
I don't why but suddenly it worked... I presumed I've missed something
somewhere..

Experience tells me it'll suddenly stop again, don't fret ;) When I
have something that stop and starts, I usually stop and gather the exact
information being sent, byte for byte, from a success and a failure and
compare it. Using LiveHttpHeaders (for firefox, here's an IE version
with a name something like it) you can grab the exact headers, cookies
(in the headers) and post data sent.

Personally, I would take the time to set up a test I know works, and if
it ever fails you again, you can run that test again and see what's
different now. I'd even go as far as to record the headers for the test
now, while it works, and save it for when it doesn't. (I'm not usually
so proactive, but this could be a serious bear to debug without
known-good headers/etc, and I'm lazy.)
Just for my own education, could this code be rewriten in a more elegant
way ?

I'm not one to talk to about 'elegant' code... I'm more in the 'Hey, it
works, right?' category. Hehe.

Glad it works! Enjoy.
 
A

Aaron Patterson

I'm trying to fetch an url which needs several cookies to be set in
order to properly return a result.

I've found a page in the website from which I can get the session
cookies (instead of posting cookies set by myself I prefer use the ones
coming from the server)

So,

def http_get(url, url_before = nil)
headers = Hash.new()
headers['User-agent'] = "Mozilla/4.0 (compatible; MSIE 6.0; Windows
NT 5.1)"
unless url_before.nil?
response = @http.get(url_before)
cookies = response.response['set-cookie']
headers['Cookie'] = cookies
end
response = @http.get(url, headers)
raise "url #{url} not accessible on host #{@host}:#{@port} - code
#{response.code}" if not ['200','302'].include?(response.code)
response.body
end


The problem is that I'm not sure it the way I repost the cookies is
right or not. The cookies retrieved by the unless block ARE OK but when
the second @http.get occurs, the remote web server ignore them and send
a redirect to a default page.
[snip]

Why write all this yourself? WWW::Mechanize will handle storing and
sending cookies for you. Then you can concentrate on getting the data
from the web page.

http://mechanize.rubyforge.org/

You can even set a custom user agent string! Hope that helps.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,231
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top