Hpricot & mechanize fail to parse page after redirect

Ehud Rosenberg · Nov 14, 2007

Hi everyone,
My quest with mechanize/Hpricot continues

Something extremely strange happened today - some simple working code
broke down, and i can't figure out why.

I am trying to access a piratebay.org search page, which does a redirect
to a relative url like this:
original link:
http://thepiratebay.org/s/?page=0&orderby=3&q=football+manager+2008&searchTitle=on

redirects to:
/search/football manager 2008/0/3/0

Now, this all worked dandily up till yesterday. the page was redirected,
and mechanize even handled the cookie that was sent back from the site.
But today, i am getting this strange error:
"URI::InvalidURIError: bad URI(is not URI?): /search/football manager
2008/0/3/0"
from Hpricot. Mechanize gives a different one, but i'm sure it's
inherited from hpricot's problem with getting the page.

I have tested this on 2 different machines, and they both break down.
Can someone please give it a go and see if they can figure it out?
I would be very very thankful

Thanks,
Ehud

PS - I am using hpricot 0.6, and the redirected page is parsed correctly
when accessed directly

Rob Biedenharn · Nov 14, 2007

Hi everyone,
My quest with mechanize/Hpricot continues
Something extremely strange happened today - some simple working code
broke down, and i can't figure out why.

I am trying to access a piratebay.org search page, which does a
redirect
to a relative url like this:
original link:
http://thepiratebay.org/s/?page=0&orderby=3&q=football+manager+2008&searchTitle=on

redirects to:
/search/football manager 2008/0/3/0

Now, this all worked dandily up till yesterday. the page was
redirected,
and mechanize even handled the cookie that was sent back from the
site.
But today, i am getting this strange error:
"URI::InvalidURIError: bad URI(is not URI?): /search/football manager
2008/0/3/0"
from Hpricot. Mechanize gives a different one, but i'm sure it's
inherited from hpricot's problem with getting the page.

I have tested this on 2 different machines, and they both break down.
Can someone please give it a go and see if they can figure it out?
I would be very very thankful

Thanks,
Ehud

PS - I am using hpricot 0.6, and the redirected page is parsed
correctly
when accessed directly

If the redirect is via a 302 with a Location: header that is just the:
"/search/football manager 2008/0/3/0"

it's probably similar to the issue I had using HTTPClient. The
relevant bit of code from HTTPClient is:
def default_redirect_uri_callback(uri, res)
newuri = URI.parse(res.header['location'][0])
unless newuri.is_a?(URI::HTTP)
newuri = URI.join(uri, newuri)
STDERR.puts(
"could be a relative URI in location header which is not
recommended")
STDERR.puts(
"'The field value consists of a single absolute URI' in HTTP
spec")
end
puts "Redirect to: #{newuri}" if $DEBUG
newuri
end

Note the line: URI.join(uri, newuri) which takes the (presumed)
relative newuri and interprets it with respect to the original uri.
(Note also that I've recently sent the author of httpclient a patch
that fixed this line.)

-Rob

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)

Ehud Rosenberg · Nov 14, 2007

That is probably the case when using Hpricot - but mechanize handles
this and has a method that takes a relative url redirect and creates a
fully qualified one.
Also it worked for me yesterday with the exact same code (I know that
sounds crazy!

Thanks for the quick and thorough reply bob!

How to get redirect log in Mechanize?	0	Aug 30, 2009
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 12, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	May 1, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Feb 15, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 1, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 15, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Dec 15, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Nov 1, 2007

Hpricot & mechanize fail to parse page after redirect

Ehud Rosenberg

Rob Biedenharn

Ehud Rosenberg

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads