Code duplication

Arun Kumar · Apr 6, 2009

Hi all,
The following is the code for extracting the html contents of a
website. I have included the code in case url redirect and BadRequest
error.

#getting the HTTP response from 'uri'
response = Net::HTTP.get_response(uri)
case response
# if the url is redirecting then fetch the contents of the
redirected url
when Net::HTTPRedirection then uri = URI.parse(response['Location'])
response =
Net::HTTP.get_response(uri)
# in case of a bad request error
when Net::HTTPBadRequest then http = Net::HTTP.start(uri.host,
uri.port)
#getting the html data by setting the path as '/' and using a user
agent
response = http.get("/", "User-Agent"=>"Mozilla/4.0 (compatible; MSIE
5.5; Windows NT 5.0)")
end

data = response.body

My tutor is saying that there is a duplication in the above code. ie.
code for html reading is specified twice without any purpose and it
should be removed. I've no idea where there is a mistake. I'm a newbee
to ruby and i don't understand the problem correctly or where things
went wrong. Can anyone please help me to find the mistake.

Thanks in advance.

Regards
Arun

Loga Ganesan · Apr 6, 2009

Arun said:
Hi all,
The following is the code for extracting the html contents of a
website. I have included the code in case url redirect and BadRequest
error.

#getting the HTTP response from 'uri'
response = Net::HTTP.get_response(uri)
case response
# if the url is redirecting then fetch the contents of the
redirected url
when Net::HTTPRedirection then uri = URI.parse(response['Location'])
response =
Net::HTTP.get_response(uri)
# in case of a bad request error
when Net::HTTPBadRequest then http = Net::HTTP.start(uri.host,
uri.port)
#getting the html data by setting the path as '/' and using a user
agent
response = http.get("/", "User-Agent"=>"Mozilla/4.0 (compatible; MSIE
5.5; Windows NT 5.0)")
end

data = response.body

My tutor is saying that there is a duplication in the above code. ie.
code for html reading is specified twice without any purpose and it
should be removed. I've no idea where there is a mistake. I'm a newbee
to ruby and i don't understand the problem correctly or where things
went wrong. Can anyone please help me to find the mistake.

Thanks in advance.

Regards
Arun

What is the use of this below statement ?
response = http.get("/", "User-Agent"=>"Mozilla/4.0
(compatible; MSIE
5.5; Windows NT 5.0)")

Since you had already got the response object using get_response, then
why it is needed?

Arun Kumar · Apr 6, 2009

Loga said:
Arun said:

Hi all,
The following is the code for extracting the html contents of a
website. I have included the code in case url redirect and BadRequest
error.

#getting the HTTP response from 'uri'
response = Net::HTTP.get_response(uri)
case response
# if the url is redirecting then fetch the contents of the
redirected url
when Net::HTTPRedirection then uri = URI.parse(response['Location'])
response =
Net::HTTP.get_response(uri)
# in case of a bad request error
when Net::HTTPBadRequest then http = Net::HTTP.start(uri.host,
uri.port)
#getting the html data by setting the path as '/' and using a user
agent
response = http.get("/", "User-Agent"=>"Mozilla/4.0 (compatible; MSIE
5.5; Windows NT 5.0)")
end

data = response.body

My tutor is saying that there is a duplication in the above code. ie.
code for html reading is specified twice without any purpose and it
should be removed. I've no idea where there is a mistake. I'm a newbee
to ruby and i don't understand the problem correctly or where things
went wrong. Can anyone please help me to find the mistake.

Thanks in advance.

Regards
Arun

Click to expand...

What is the use of this below statement ?
response = http.get("/", "User-Agent"=>"Mozilla/4.0
(compatible; MSIE
5.5; Windows NT 5.0)")

Since you had already got the response object using get_response, then
why it is needed?

Hi,
Thanks for the reply. If it is a bad request error, then I have to
communicate to the port and host and then I've to fetch the data. For
eg. if i try to fetch html contents from youtube.com, i get a bad
request error. So I used the Net::HTTP.start() and then I used the path
and user agent to retreive the contents and stored it in response. I
dont think that there is any other way. If I remove that part, I'm not
able to read the html.

Thanks
Arun

Eleanor McHugh · Apr 6, 2009

My tutor is saying that there is a duplication in the above code. ie.
code for html reading is specified twice without any purpose and it
should be removed. I've no idea where there is a mistake. I'm a newbee
to ruby and i don't understand the problem correctly or where things
went wrong. Can anyone please help me to find the mistake.

There are probably better solutions, but the following illustrates the
point your tutor is making:

MOZILLA_HEADER = { "User-Agent"=>"Mozilla/4.0 (compatible; MSIE 5.5;
Windows NT 5.0)" }

def get_http_response uri, max_redirects = 0
Net::HTTP.start(uri) do |connection|
response = connection.get(uri.path, MOZILLA_HEADER)
response &&= case response
when Net::HTTPRedirection
if max_redirects > 0 then
get_http_response URI.parse(response['Location']),
(max_redirects - 1)
else
raise "Too many redirects"
end
when Net::HTTPRedirection
get_http_response URI.parse("http://#{uri.host}:#{uri.port}/"),
max_redirects
end
end
end

data = get_http_response(my_uri, 3).body

See how get_http_response is recursive in the case of an erroneous
response? This minimises the actual HTTP interaction code as well as
elegantly handling redirects. Whilst this could result in many more
http connections being used, it also makes them clear up after
themselves which is always good.

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

Arun Kumar · Apr 7, 2009

There are probably better solutions, but the following illustrates the

point your tutor is making:

MOZILLA_HEADER = { "User-Agent"=>"Mozilla/4.0 (compatible; MSIE 5.5;
Windows NT 5.0)" }

def get_http_response uri, max_redirects = 0
Net::HTTP.start(uri) do |connection|
response = connection.get(uri.path, MOZILLA_HEADER)
response &&= case response
when Net::HTTPRedirection
if max_redirects > 0 then
get_http_response URI.parse(response['Location']),
(max_redirects - 1)
else
raise "Too many redirects"
end
when Net::HTTPRedirection
get_http_response URI.parse("http://#{uri.host}:#{uri.port}/"),
max_redirects
end
end
end

data = get_http_response(my_uri, 3).body

Thanks Ellie,You gave me a clue of not only solving the code
duplication but also about handling the redirects. Thanks a lot.
Regards
Arun

Eleanor McHugh · Apr 7, 2009

Thanks Ellie,You gave me a clue of not only solving the code
duplication but also about handling the redirects. Thanks a lot.

My pleasure

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

Arun Kumar · Apr 7, 2009

Eleanor said:
My pleasure

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

Hi Ellie,
I once again thank you for your reply. It helped me a lot. Now I want to
share some doubt with you.
1) How can i specify the redirect limit without declaring it inside a
method. Is it possible?
2) By including a redirect limit, will I be able to make the code for
url redirection the most effective one or should i include some aditions
to the code to handle redirection effectively?

Thanks
Arun

Eleanor McHugh · Apr 7, 2009

Hi Ellie,
I once again thank you for your reply. It helped me a lot. Now I
want to
share some doubt with you.
1) How can i specify the redirect limit without declaring it inside a
method. Is it possible?

The redirect limit isn't declared inside the method but as one of the
parameters of the method, which is why it allows recursive execution
as each redirect is received. You'll note that I provided an initial
value as part of the initial functional call:

data = get_http_response(my_uri, 3).body

but in a real-world program you either specify a constant and use that:

MAXIMUM_REDIRECTS = 3
data = get_http_response(MAXIMUM_REDIRECTS, 3).body

or else wrap everything together into an object where this value would
be either an instance or class variable depending on your intent.

2) By including a redirect limit, will I be able to make the code for
url redirection the most effective one or should i include some
aditions
to the code to handle redirection effectively?

I can't really answer that question without knowing more about the
real-world problem you're trying to solve. However in general I'd say
that whenever you have a recursive problem like this it's sensible to
ensure that it's throttled to prevent resource exhaustion. For a very
graphic example of why this is important - especially with network
applications - read up on the Morris Worm

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

following url redirect	0	Apr 3, 2009
400 "Bad Request"	2	Mar 26, 2009
What is the meaning of #<Net::HTTPFound:0x29b78f8>	2	May 27, 2011
How to submit a form and get response content	0	Jun 23, 2009
Errors on REXML reading an HTML.	1	Dec 24, 2010
Net::HTTP::Put with 302 redirect?	1	May 21, 2007
Newbie:: HTTP request_get, how server side parses parameter and response?	4	Mar 13, 2008
how to rewrite a curl request into a NET::HTTP one ?	2	Aug 28, 2010

Code duplication

Arun Kumar

Loga Ganesan

Arun Kumar

Eleanor McHugh

Arun Kumar

Eleanor McHugh

Arun Kumar

Eleanor McHugh

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads